Unix Text Processing
Unix Text Processing
TEXT PROCESSING
HOWARD u!SAMs &COMPANY
HAYDEN BOOKS
Related Titles
Advanced C Primer++ UNIX@ System V Bible
Stephen Prata, The Waite Group Stephen Prata and Donald Martin,
The Waite Group
Discovering MS-DOS@
Kate O’Day, The Waite Group UNIX@ Communications
Bryan Costales, The Waite Group
Microsoft? C Programming
for the IBM@ C with Excellence:
Robert Lajore, The Waite Group Programming Proverbs
Henry F. Ledgard with John Tauer
MS-DOS@ Bible
Steven Simrin, The Waite Group C Programmer’s Guide
to Serial Communications
MS-DOS@ Developer’s Guide Joe Campbell
John Angermeyer and Kevin Jaeger,
Hayden Books
The Waite Group
For the retailer nearest you, or io order directly from the publisher,
call 800428-SAMs. In Indiana, Alaska, and Hawaii call 31 7-298-5699.
TEXT IPROCESSINC
CONSULTING EDITORS:
HAYDEN BOOKS
A Division of Howard W Sams G Company
4300 West 62nd Street
Indianapolis, Indiana 46268 USA
Copyright 0 1987 O’Reilly & Associates, Inc.
FIRST EDITION
SECOND PRINTING - 1988
All rights reserved. No part of this book shall be reproduced, stored in a retrieval system, or
transmitted by any means, electronic, mechanical, photocopying, recording, or otherwise, without
written permission from the publisher. No patent liability is assumed with respect to the use of
the information contained herein. While every precaution has been taken in the preparation of
this book, the publisher assumes no responsibility for errors or omissions. Neither is any liability
assumed for damages resulting from the use of the information contained herein.
International Standard Book Number: 0-672-4629 1-5
Library of Congress Catalog Card Number: 87-60537
Acquisitions Editor: Therese Zak
Editor: Susan Pink Bussiere
Cover: Visual Graphic Services, Indianapolis
Design by Jerry Bates
Illustration by Patrick Sarles
Typesetting: O’Reilly & Associates, Inc.
Printed in the United States of America
Trademark Acknowledgements
All terms mentioned in this book that are known to be trademarks or service marks are listed
below. Howard W. Sams & Co. cannot attest to the accuracy of this information. Use of a term
in this book should not be regarded as affecting the validity of any trademark or service mark.
Apple is a registered trademark and Apple Laserwriter is a trademark of Apple Computer, Inc.
devps is a trademark of Pipeline Associates, Inc.
Merge/286 and Merge/386 are trademarks of Locus Computing Corp.
DDL is a trademark of Imagen Corp.
Helvetica and Times Roman are registered trademarks of Allied Corp.
IBM is a registered trademark of International Business Machines Corp.
Interpress is a trademark of Xerox Corp.
LaserJet is a trademark of Hewlett-Packard Corp.
Laserwriter is a trademark of Apple Computer, Inc.
Linotronic is a trademark of Allied Corp.
Macintosh is a trademark licensed to Apple Computer, Inc.
Microsoft is a registered trademark of Microsoft Corp.
MKS Toolkit is a trademark of Mortice Kern Systems, Inc.
Multimate is a trademark of Multimate International Corp.
Nutshell Handbook is a trademark of O’Reilly & Associates, Inc.
PC-Interface is a trademark of Locus Computing Corp.
PostScript is a trademark of Adobe Systems, Incorporated.
PageMaker is a registered trademark of Aldus Corporation.
SoftQuad Publishing Software and SQtroff are trademarks of SoftQuad Inc.
WordStar is a registered trademark of MicroPro International Corp.
UNIX is a registered trademark of AT&T.
VP/ix is a trademark of Interactive Systems Corp. and Phoenix Technologies, Ltd.
C 0 N T E N T S
Preface xi
A Workspace . 2
Tools for Editing . 4
Document Formatting . . 6
Printing - 8
Other UNIX Text-Processing Tools . . 10
2 UNIX Fundamentals 12
3 Learningvi 24
6 ThemmMacros 128
Using t b l .204
Specifying Tables . 205
A Simple Table Example . 206
Laying Out a Table . 207
Describing
- Column Formats . 209
Changing the Format within a Table . 219
Putting Text Blocks in a Column . 221
Breaking Up Long Tables . 224
Putting Titles on Tables . 225
A tbl Checklist . 226
Some Complex Tables . . 227
Invoking a w k . 388
Records and Fields . 389
Testing Fields , 390
Passing Parameters from a Shell Script . 390
Changing the Field Separator . . 391
System Variables . 392
Looping . . 393
a w k Applications . 400
Testing Programs . 410
14 Writing nrof f and t r o f f Macros 412
Comments . 412
Defining Macros . 413
Macro Names . 414
Macro Arguments . 416
Nested Macro Definitions . 418
Conditional Execution . . 418
Interrupted Lines . 423
Number Registers . 424
Defining Strings . . 429
Diversions . 431
Environment Switching . 433
Redefining Control and Escape Characters . 435
Debugging Your Macros . 436
Error Handling . . 439
Macro Style . 441
Index 647
Preface
Many people think of computers primarily as “number crunchers,” and think of word
processors as generating form letters and boilerplate proposals. That computers can be
used productively by writers, not just research scientists, accountants, and secretaries, is
not so widely recognized. Today, writers not only work with words, they work with
computers and the software programs, printers, and terminals that are part of a computer
system.
The computer has not simply replaced a typewriter; it has become a system for
integrating many other technologies. As these technologies are made available at a rea-
sonable cost, writers may begin to find themselves in new roles as computer program-
mers, systems integrators, data base managers, graphic designers, typesetters, printers,
and archivists.
The writer functioning in these new roles is faced with additional responsibilities.
Obviously, it is one thing to have a tool available and another thing to use it skillfully.
Like a craftsman, the writer must develop a number of specialized skills, gaining con-
trol over the method of production as well as the product. The writer must look for
ways to improve the process by integrating new technologies and designing new tools
in software.
In this book, we want to show how computers can be used effectively in the
preparation of written documents, especially in the process of producing book-length
documents. Surely it is important to learn the tools of the trade, and we will demon-
strate the tools available in the UNIX environment. However, it is also valuable to
examine text processing in terms of problems and solutions: the problems faced by a
writer undertaking a large writing project and the solutions offered by using the
resources and power of a computer system.
In Chapter 1, we begin by outlining the general capabilities of word-processing
systems. We describe in brief the kinds of things that a computer must be able to do
for a writer, regardless of whether that writer is working on a UNIX system or on an
IBM PC with a word-processing package such as WordStar or MuItiMate. Then, hav-
ing defined basic word-processing capabilities, we look at how a text-processing system
includes and extends these capabilities and benefits. Last, we introduce the set of text-
= xi .
xii UNIX Text Processing 0
processing tools in the UNIX environment. These tools, used individually or in combi-
nation, provide the basic framework for a text-processing system, one that can be
custom-tailored to supply additional capabilities.
Chapter 2 gives a brief review of UNIX fundamentals. We assume you are
already somewhat acquainted with UNIX, but we included this information to make
sure that you are familiar with basic concepts that we will be relying on later in the
book.
Chapter 3 introduces the v i editor, a basic tool for entering and editing text.
Although many other editors and word-processing programs are available with UNIX,
v i has the advantage that it works, without modification, on almost every UNIX sys-
tem and with almost every type of terminal. If you learn v i , you can be confident that
your text editing skills will be completely transferable when you sit down at someone
else’s terminal or use someone else’s system.
Chapter 4 introduces the nrof f and t r o f f formatting programs. Because
v i is a text editor, not a word-processing program, it does only rudimentary formatting
of the text you enter. You can enter special formatting codes to specify how you want
the document to look, then format the text using either n r o f f or t r o f f . (The
n r o f f formatter is used for formatting documents to the screen or to typewriter-like
printers; t r o f f uses much the same formatting language, but has additional con-
structs that allow it to produce more elaborate effects on typesetters and laser printers.)
In this chapter, we also describe the different types of output devices for printing
your finished documents. With the wider availability of laser printers, you need to
become familiar with many typesetting terms and concepts to get the most out of
t r o f f ’ s capabilities.
The formatting markup language required by n r o f f and t r o f f is quite com-
plex, because it allows detailed control over the placement of every character on the
page, as well as a large number of programming constructs that you can use to define
custom formatting requests or macros. A number of macro packages have been
developed to make the markup language easier to use. These macro packages define
commonly used formatting requests for different types of documents, set up default
values for page layout, and so on.
Although someone working with the macro packages does not need to know
about the underlying requests in the formatting language used by n r o f f and t r o f f ,
we believe that the reader wants to go beyond the basics. As a result, Chapter 4 intro-
duces additional basic requests that the casual user might not need. However, your
understanding of what is going on should be considerably enhanced.
There are two principal macro packages in use today, m s and mm (named for the
command-line options to nro f f and t r o f f used to invoke them). Both macro
packages were available with most UNIX systems; now, however, m s is chiefly avail-
able on UNIX systems derived from Berkeley 4.x BSD, and mm is chiefly available on
UNIX systems derived from AT&T System V. If you are lucky enough to have both
macro packages on your system, you can choose which one you want to learn. Other-
wise, you should read either Chapter 5, The ms Macros, or Chapter 6, The m m Macros,
depending on which version you have available.
o Preface xiii
different styles of section headings, page headers, footers, and so on. We’ll also talk
about how to generate an automatic table of contents and index-two tasks that take
you beyond t r o f f into the world of shell programming and various UNIX text-
processing utilities.
To complete these tasks, we need to return to the UNIX shell in Chapter 18 and
examine in more detail the ways that it allows you to incorporate the many tools pro-
vided by UNIX into an integrated text-processing environment.
Numerous appendices summarize information that is spread throughout the text,
or that couldn’t be crammed into it.
***
Before we turn to the subject at hand, a few acknowledgements are in order. Though
only two names appear on the cover of this book, it is in fact the work of many hands.
In particular, Grace Todino wrote the chapters on t b l and e q n in their entirety, and
the chapters on v i and ex are based on the O’Reilly & Associates’ Nutshell Hand-
book, Learning the Vi Editor, written by Linda Lamb. Other members of the O’Reilly
& Associates staff-Linda Mui, Valerie Quercia, and Donna Woonteiler-helped tire-
lessly with copyediting, proofreading, illustrations, typesetting, and indexing.
Donna was new to our staff when she took on responsibility for the job of
copyfitting-that final stage in page layout made especially arduous by the many fig-
ures and examples in this book. She and Linda especially spent many long hours get-
ting this book ready for the printer. Linda had the special job of doing the final con-
sistency check on examples, making sure that copyediting changes or typesetting errors
had not compromized the accuracy of the examples.
Special thanks go to Steve Talbott of Masscomp, who first introduced us to the
power of t r o f f and who wrote the first version of the extended m s macros, f o r -
mat shell script, and indexing mechanism described in the second half of this book.
Steve’s help and patience were invaluable during the long road to mastery of the UNIX
text-processing environment.
We’d also like to thank Teri Zak, the acquisitions editor at Hayden Books, for her
vision of the Hayden UNIX series, and this book’s place in it.
In the course of this book’s development, Hayden was acquired by Howard Sams,
where Teri’s role was taken over by Jim Hill. Thanks also to the excellent production
editors at Sams, Wendy Ford, Lou Keglovitz, and especially Susan Pink Bussiere,
whose copyediting was outstanding.
Through it all, we have had the help of Steve Kochan and Pat Wood of Pipeline
Associates, Enc., consulting editors to the Hayden UNIX Series. We are grateful for
their thoughtful and thorough review of this book for technical accuracy. (We must, of
course, make the usual disclaimer: any errors that remain are our own.)
Steve and Pat also provided the macros to typeset the book. Our working drafts
were printed on an HP LaserJet printer, using d i t r o f f and TextWare International’s
t p l u s postprocessor. Final typeset output was prepared with Pipeline Associates’
devps, which was used to convert d i t r o f f output to PostScript, which was used in
turn to drive a Linotronic LlOO typesetter.
C H A P T E R
Before we consider the special tools that the UNIX environment provides for text pro-
cessing, we need to think about the underlying changes in the process of writing that are
inevitable when you begin to use a computer.
The most important features of a computer program for writers are the ability to
remember what is typed and the ability to allow incremental changes-no more retyping
from scratch each time a draft is revised. For a writer first encountering word-
processing software, no other features even begin to compare. The crudest command
structure, the most elementary formatting capabilities, will be forgiven because of the
immense labor savings that take place.
Writing is basically an iterative process. It is a rare writer who dashes out a fin-
ished piece; most of us work in circles, returning again and again to the same piece of
prose, adding or deleting words, phrases, and sentences, changing the order of thoughts,
and elaborating a single sentence into pages of text.
A writer working on paper periodically needs to clear the deck-to type a clean
copy, free of elaboration. As the writer reads the new copy, the process of revision
continues, a word here, a sentence there, until the new draft is as obscured by changes
as the first. As Joyce Carol Oates i s said to have remarked: “No book is ever finished.
It is abandoned.”
Word processing first took hold in the office as a tool to help secretaries prepare
perfect letters, memos, and reports. As dedicated word processors were replaced with
low-cost personal computers, writers were quick to see the value of this new tool. In a
civilization obsessed with the written word, it is no accident that WordStar, a word-
processing program, was one of the first best sellers of the personal computer revolu-
tion.
As you learn to write with a word processor, your working style changes.
Because it is so easy to make revisions, it is much more forgivable to think with your
fingers when you write, rather than to carefully outline your thoughts beforehand and
polish each sentence as you create it.
If you do work from an outline, you can enter it first, then write your first draft by
filling in the outline, section by section. If you are writing a structured document such
2 0 UNlX Text Processing 0
as a technical manual, your outline points become the headings in your document; if
you are writing a free-flowing work, they can be subsumed gradually in the text as you
flesh them out. In either case, it i s easy to write in small segments that can be moved
as you reorganize your ideas.
Watching a writer at work on a word processor is very different from watching a
writer at work on a typewriter. A typewriter tends to enforce a linear flow-you must
write a passage and then go back later to revise it. On a word processor, revisions are
constant-you type a sentence, then go back to change the sentence above. Perhaps
you write a few words, change your mind, and back up to take a different tack; or you
decide the paragraph you just wrote would make more sense if you put it ahead of the
one you wrote before, and move it on the spot.
This is not to say that a written work is created on a word processor in a single
smooth flow; in fact, the writer using a word processor tends to create many more drafts
than a compatriot who still uses a pen or typewriter. Instead of three or four drafts, the
writer may produce ten or twenty. There is still a certain editorial distance that comes
only when you read a printed copy. This is especially true when that printed copy is
nicely formatted and letter perfect.
This brings us to the second major benefit of word-processing programs: they
help the writer with simple formatting of a document. For example, a word processor
may automatically insert carriage returns at the end of each line and adjust the space
between words so that all the lines are the same length. Even more importantly, the
text is automatically readjusted when you make changes. There are probably commands
for centering, underlining, and boldfacing text.
The rough formatting of a document can cover a multitude of sins. A s you read
through your scrawled markup of a preliminary typewritten draft, it is easy to lose track
of the overall flow of the document. Not so when you have a clean copy-the flaws of
organization and content stand out vividly against the crisp new sheets of paper.
However, the added capability to print a clean draft after each revision also puts
an added burden on the writer. Where once you had only to worry about content, you
may now find yourself fussing with consistency of margins, headings, boldface, italics,
and all the other formerly superfluous impedimenta that have now become integral to
your task.
As the writer gets increasingly involved in the formatting of a document, it
becomes essential that the tools help revise the document’s appearance as easily as its
content. Given these changes imposed by the evolution from typewriters to word pro-
cessors, let’s take a look at what a word-processing system needs to offer to the writer.
A Workspace
One of the most important capabilities of a word processor is that it provides a space in
which you can create documents. In one sense, the video display screen on your termi-
nal, which echoes the characters you type, is analogous to a sheet of paper. But the
workspace of a word processor i s not so unambiguous as a sheet of paper wound into a
typewriter, that may be added neatly to the stack of completed work when finished, or
tom out and crumpled as a false start. From the computer’s point of view, your
0 From Typewriters to Word Processors 0 3
workspace is a block of memory, called a hufSeer, that is allocated when you begin a
word-processing session. This buffer is a temporary holding area for storing your work
and is emptied at the end of each session.
To save your work, you have to write the contents of the buffer to a file. A file is
a permanent storage area on a disk (a hard disk or a floppy disk). After you have saved
your work in a file, you can retrieve it for use in another session.
When you begin a session editing a document that exists on file, a copy of the file
is made and its contents are read into the buffer. You actually work on the copy, mak-
ing changes to it, not the original. The file is not changed until you save your changes
during or at the end of your work session. You can also discard changes made to the
buffered copy, keeping the original file intact, or save multiple versions of a document
in separate files.
Particularly when working with larger documents, the management of disk files
can become a major effort. If, like most writers, you save multiple drafts, it is easy to
lose track of which version of a file is the latest.
An ideal text-processing environment for serious writers should provide tools for
saving and managing multiple drafts on disk, not just on paper. It should allow the
writer to
Most word-processing programs for personal computers seem to work best for short
documents such as the letters and memos that offices chum out by the millions each
day. Although it is possible to create longer documents, many features that would help
organize a large document such as a book or manual are missing from these programs.
However, long before word processors became popular, programmers were using
another class of programs called text editors. Text editors were designed chiefly for
entering computer programs, not text. Furthermore, they were designed for use by com-
puter professionals, not computer novices. A s a result, a text editor can be more diffi-
cult to learn, lacking many on-screen formatting features available with most word pro-
cessors.
Nonetheless, the text editors used in program development environments can pro-
vide much better facilities for managing large writing projects than their office word-
processing counterparts. Large programs, like large documents, are often contained in
many separate files; furthermore, it is essential to track the differences between versions
of a program.
UNIX is a pre-eminent program development environment and, as such, it is also
a superb document development environment. Although its text editing tools at first
may appear limited in contrast to sophisticated office word processors, they are in fact
considerably more powerful.
4 0 UNIX Text Processing 0
To make changes to a document, you must be able to move to that place in the text
where YOU want to make your edits. Most documents are too large to be displayed in
their entirety on a single terminal screen, which generally displays 24 lines of text.
Usually only a portion of a document is displayed. This partial view of your document
i s sometimes referred to as a window.* If you are entering new text and reach the bot-
tom line in the window, the text on the screen automatically scrolls (rolls up) to reveal
an additional line at the bottom. A cursor (an underline or block) marks your current
position in the window.
There are basically two kinds of movement:
When you begin a session, the first line of text is the first line in the window, and the
cursor is positioned on the first character. Scrolling commands change which lines are
displayed in the window by moving forward or backward through the document.
Cursor-positioning commands allow you to move up and down to individual lines, and
along lines to particular characters.
After you position the cursor, you must issue a command to make the desired
edit. The command you choose indicates how much text will be affected: a character, a
word, a line, or a sentence.
Because the same keyboard is used to enter both text and commands, there must
be some way to distinguish between the two. Some word-processing programs assume
that you are entering text unIess you specify otherwise; newly entered text either
*Some editors, such as emacs, can split the terminal screen into multiple windows. In addition, many
high-powered UNIX workstations with large bit-mapped screens have their own windowing software that
allows multiple programs to be run simultaneously in separate windows. For purposes of this book, we
assume you are using the v i editor and an alphanumeric terminal with only a single window.
0 From Typewriters to Word Processors 0 5
replaces existing text or pushes it over to make room for the new text. Commands are
entered by pressing special keys on the keyboard, or by combining a standard key with
a special key, such as the control key (CTRL).
Other programs assume that you are issuing commands; you must enter a com-
mand before you can type any text at all. There are advantages and disadvantages to
each approach. Starting out in text mode is more intuitive to those coming from a type-
writer, but may be slower for experienced writers, because all commands must be
entered by special key combinations that are often hard to reach and slow down typing.
(We’ll return to this topic when we discuss v i , a UNIX text editor.)
Far more significant than the style of command entry is the range and speed of
commands. For example, though it is heaven for someone used to a typewriter to be
able to delete a word and type in a replacement, it is even better to be able to issue a
command that will replace every occurrence of that word in an entire document. And,
after you start making such global changes, it is essential to have some way to undo
them if you make a mistake.
A word processor that substitutes ease of learning for ease of use by having fewer
commands will ultimately fail the serious writer, because the investment of time spent
learning complex commands can easily be repaid when they simplify complex tasks.
And when you do issue a complex command, it is important that it works as
quickly as possible, so that you aren’t left waiting while the computer grinds away.
The extra seconds add up when you spend hours or days at the keyboard, and, once
having been given a taste of freedom from drudgery, writers want as much freedom as
they can get.
Text editors were developed before word processors (in the rapid evolution of
computers). Many of them were originally designed for printing terminals, rather than
for the CRT-based terminals used by word processors. These programs tend to have
commands that work with text on a line-by-line basis. These commands are often more
obscure than the equivalent office word-processing commands.
However, though the commands used by text editors are sometimes more difficult
to learn, they are usually very effective. (The commands designed for use with slow
paper terminals were often extraordinarily powerful, to make up for the limited capabili-
ties of the input and output device.)
There are two basic kinds of text editors, line editors and screen editors, and both
are available in UNIX. The difference is simple: line editors display one line at a time,
and screen editors can display approximately 24 lines or a full screen.
The line editors in UNIX include ed, sed, and ex. Although these line edi-
tors are obsolete for general-purpose use by writers, there are applications at which they
excel, as we will see in Chapters 7 and 12.
The most common screen editor in UNIX is v i . Learning v i or some other
suitable editor is the first step in mastering the UNIX text-processing environment.
Most of your time will be spent using the editor.
UNIX screen editors such as v i and emacs (another editor available on many
UNIX systems) lack ease-of-learning features common in many word processors-there
are no menus and only primitive on-line help screens, and the commands are often com-
plex and nonintuitive-but they are powerful and fast. What’s more, UNIX line editors
such as e x and sed give additional capabilities not found in word processors-the
6 0 UNIX Text Processing 0
ability to write a script of editing commands that can be applied to multiple files. Such
editing scripts open new ranges of capability to the writer.
Document Formatting
Text editing is wonderful, but the object of the writing process is to produce a printed
document for others to read. And a printed document is more than words on paper; it is
an arrangement of text on a page. For instance, the elements of a business letter are
arranged in a consistent format, which helps the person reading the letter identify those
elements. Reports and more complex documents, such as technical manuals or books,
require even greater attention to formatting. The format of a document conveys how
information is organized, assisting in the presentation of ideas to a reader.
Most word-processing programs have built-in formatting capabilities. Formatting
commands are intermixed with editing commands, so that you can shape your document
on the screen. Such formatting commands are simple extensions of those available to
someone working with a typewriter. For example, an automatic centering command
saves the trouble of manually counting characters to center a title or other text. There
may also be such features as automatic pagination and printing of headers or footers.
Text editors, by contrast, usually have few formatting capabilities. Because they
were designed for entering programs, their formatting capabilities tend to be oriented
toward the formats required by one or more programming languages.
Even programmers write reports, however. Especially at AT&T (where UNIX
was developed), there was a great emphasis on document preparation tools to help the
programmers and scientists of Bell Labs produce research reports, manuals, and other
documents associated with their development work.
Word processing, with its emphasis on easy-to-use programs with simple on-
screen formatting, was in its infancy. Computerized phototypesetting, on the other
hand, was already a developed art. Until quite recently, it was not possible to represent
on a video screen the variable type styles and sizes used in typeset documents. As a
result, phototypesetting has long used a markup system that indicates formatting instruc-
tions with special codes. These formatting instructions to the computerized typesetter
are often direct descendants of the instructions that were formerly given to a human
typesetter-center the next line, indent five spaces, boldface this heading.
The text formatter most commonly used with the UNIX system is called n r o f f .
To use it, you must intersperse formatting instructions (usually one- or two-letter codes
preceded by a period) within your text, then pass the file through the formatter. The
n r o f f program interprets the formatting codes and reformats the document “on the
fly” while passing it on to the printer. The n r o f f formatter prepares documents for
printing on line printers, dot-matrix printers, and letter-quality printers. Another pro-
gram called t r o f f uses an extended version of the same markup language used by
n r o f f , but prepares documents for printing on laser printers and typesetters. We’ll
talk more about printing in a moment.
Although formatting with a markup language may seem to be a far inferior system
to the “what you see is what you get” (wysiwyg) approach of most office word-
processing programs, it actually has many advantages.
0 From Typewriters to Word Processors 0 7
First, unless you are using a very sophisticated computer, with very sophisticated
software (what has come to be called an electronic publishing system, rather than a
mere word processor), it is not possible to display everything on the screen just as it
will appear on the printed page. For example, the screen may not be able to represent
boldfacing or underlining except with special formatting codes. Wordstar, one of the
grandfathers of word-processing programs for personal computers, represents underlin-
ing by surrounding the word or words to be underlined with the special control charac-
ter AS (the character generated by holding down the control key while typing the letter
S). For example, the following title line would be underlined when the document is
printed:
^Sword Processing with WordStar”S
Even more significantly, if you later decide to change the design, you simply
change the definition o f the relevant design elements. If you have used a word proces-
sor to format the document as it was written, it is usually a painful task to go back and
change the format.
Some word-processing programs, such as Microsoft WORD, include features for
defining global document formats, but these features are not as widespread as they are
in markup systems.
Printing
The formatting capabilities of a word-processing system are limited by what can be out-
put on a printer. For example, some printers cannot backspace and therefore cannot
underline. For this discussion, we are considering four different classes of printers: dot
matrix, letter quality, phototypesetter, and laser.
A dot-matrix printer composes characters as a series o f dots. It is usually suitable
for preparing interoffice memos and obtaining fast printouts o f large files.
TL,:
II,L>
- paraqraph was printed ~ l t ha dot-fiatrig p r i n t e r . It m e 5 a print
head cc.ntaining 9 pins, which are adjuster! t o produce the shape ci each
c h a r a c t x . More rophicated dot-aatrix p r i n t e r s h a r e p r i n t heads
contaising up t o 24 pins. The greater t h e nufiber o f pins, the finer
t h e d o t s ?hat a r e printed, and t h e mure psssible i t 1 5 tcr io01 the eye
into t h i n k i n g i t sees a solid character. Got atatrix prioters are ais0
c a p i h j ~r ~ f prifiting o u t graphic disp!ays.
This paragraph, like the rest of this book, was phototypeset. In photo-
typesetting, a photographic technique is used to print characters on film or
photographic paper. There is a wide choice of type styles, and the charac-
ters are much more finely formed that those produced by a letter-quality
printer. Characters are produced by an arrangement of tiny dots, much like
a dot-matrix printer-but there are over 1000 dots per inch.
There are several major advantages to typesetting. The high resolution allows for the
design of aesthetically pleasing type. The shape of the characters is much finer. In
addition, where dot-matrix and letter-quality type is usually constant width (narrow
letters like i take up the same amount of space as wide ones like m), typesetters use
variable-width type, in which narrow letters take up less space than wide ones. In addi-
tion, it’s possible to mix styles (for example, bold and italic) and sizes of type on the
same page.
Most typesetting equipment uses a markup language rather than a wysiwyg
approach to specify point sizes, type styles, leading, and so on. Until recently, the tech-
nology didn’t even exist to represent on a screen the variable-width typefaces that
appear in published books and magazines.
AT&T, a company with its own extensive internal publishing operation,
developed its own typesetting markup language and typesetting program-a sister to
n r o f f called t r o f f (typesetter-rofJ). Although trof f extends the capabilities of
n r o f f in significant ways, it is almost totally compatible with it.
Until recently, unless you had access to a typesetter, you didn’t have much use for
t r o f f . The development of low-cost laser printers that can produce near typeset-
quality output at a fraction of the cost has changed all that.
Word-processing software (particularly that developed for the Apple Macintosh, which
has a high-resolution graphics screen capable of representing variable type fonts) is
beginning to tap the capabilities of laser printers. However, most of the
microcomputer-based packages still have many limitations. Nonetheless, a markup
language such as that provided by t rof f still provides the easiest and lowest-cost
access to the world of electronic publishing for many types of documents.
The point made previously, that markup languages are preferable to wysiwyg sys-
tems for large documents, is especially true when you begin to use variable size fonts,
leading, and other advanced formatting features. I t is easy to lose track of the overall
format of your document and difficult to make overall changes after your formatted text
is in place. Only the most expensive electronic publishing systems (most of them based
on advanced UNIX workstations) give you both the capability to see what you will get
on the screen and the ability to define and easily change overall document formats.
10 0 UNIX Text Processing 0
In the past, many of the steps in creating a finished book were out of the hands of
the writer. Proofreaders and copyeditors went over the text for spelling and grammati-
cal errors. It was generally the printer who did the typesetting (a service usually paid
by the publisher). A t the print shop, a typesetter (a person) retyped the text and speci-
fied the font sizes and styles. A graphic artist, performing layout and pasteup, made
many of the decisions about the appearance of the printed page.
Although producing a high-quality book can still involve many people, UNIX
provides the tools that allow a writer to control the process from start to finish. An
analogy is the difference between an assembly worker on a production line who views
only one step in the process and a craftsman who guides the product from beginning to
end. The craftsman has his own system of putting together a product, whereas the
assembly worker has the system imposed upon him.
After you are acquainted with the basic tools available in UNIX and have spent
some time using them, you can design additional tools to perform work that you think
is necessary and helpful. To create these tools, you will write shell scripts that use the
resources of UNIX in special ways. We think there is a certain satisfaction that comes
with accomplishing such tasks by computer. It seems to us to reward careful thought.
What programming means to us is that when we confront a problem that normally
submits only to tedium or brute force, we think of a way to get the computer to solve
the problem. Doing this often means looking at the problem in a more general way and
solving it in a way that can be applied again and again.
One of the most important books on UNIX is The UNIX Programming Environ-
ment by Brian W. Kernighan and Rob Pike. They write that what makes UNIX effec-
tive ?is an approach to programming, a philosophy of using the computer.? A t the
heart of this philosophy ?is the idea that the power of a system comes more from the
relationships among programs than from the programs themselves.?
When we talk about building a document preparation system, it is this philosophy
that we are trying to apply. A s a consequence, this is a system that has great flexibility
and gives the builders a feeling of breaking new ground. The UNIX text-processing
environment is a system that can be tailored to the specific tasks you want to accom-
plish. In many instances, it can let you do just what a word processor does. In many
more instances, it lets you use more of the computer to do things that a word processor
either can?t do or can?t do very well.
C H A P T E R
UNIX Fundamentals
The UNIX operating system is a collection of programs that controls and organizes the
resources and activities of a computer system. These resources consist of hardware
such as the computer’s memory, various peripherals such as terminals, printers, and disk
drives, and software utilities that perform specific tasks on the computer system. UNIX
is a multiuser, multitasking operating system that allows the computer to perform a
variety of functions for many users. It also provides users with an environment in
which they can access the computer’s resources and utilities. This environment is
characterized by its command interpreter, the shell.
In this chapter, we review a set of basic concepts for users working in the UNIX
environment. As we mentioned in the preface, this book does not replace a general
introduction to UNIX. A complete overview is essential to anyone not familiar with the
file system, input and output redirection, pipes and filters, and many basic utilities. In
addition, there are different versions of UNIX, and not all commands are identical in
each version. In writing this book, we’ve used System V Release 2 on a Convergent
Technologies’ Miniframe.
These disclaimers aside, if it has been a while since you tackled a general intro-
duction, this chapter should help refresh your memory. If you are already familiar with
UNIX, you can skip or skim this chapter.
As we explain these basic concepts, using a tutorial approach, we demonstrate the
broad capabilities of UNIX as an applications environment for text-processing. What
you learn about UNIX in general can be applied to performing specific tasks related to
text-processing.
TheUNIXShell
As an interactive computer system, UNIX provides a command interpreter called a
shell. The shell accepts commands typed at your terminal, invokes a program to per-
form specific tasks on the computer, and handles the output or result of this program,
normally directing it to the terminal’s video display screen.
12
0 UNIX Fundamentals 0 13
UNIX commands can be simple one-word entries like the date command:
$ date
Tue A p r 8 13:23:41 EST 1 9 8 7
Or their usage can be more complex, requiring that you specify options and arguments,
such as filenames. Although some commands have a peculiar syntax, many UNIX
commands follow this general form:
command option(s) argument(s)
A command identifies a software program or utility. Commands are entered in
lowercase letters. One typical command, Is, lists the files that are available in your
immediate storage area, or directory.
An option modifies the way in which a command works. Usually options are
indicated by a minus sign followed by a single letter. For example, Is -1 modifies
what information is displayed about a file. The set of possible options is particular to
the command and generally only a few of them are regularly used. However, if you
want to modify a command to perform in a special manner, be sure to consult a UNIX
reference guide and examine the available options.
An argument can specify an expression or the name of a file on which the com-
mand is to act. Arguments may also be required when you specify certain options. In
addition, if more than one filename is being specified, special metacharacters (such as
* and ?) can be used to represent the filenames. For instance, Is -1 ch* will
display information about all files that have names beginning with c h .
The UNIX shell is itself a program that is invoked as part of the login process.
When you have properly identified yourself by logging in, the UNIX system prompt
appears on your terminal screen.
The prompt that appears on your screen may be different from the one shown in
the examples in this book. There are two widely used shells: the Bourne shell and the
C shell. Traditionally, the Bourne shell uses a dollar sign ($) as a system prompt, and
the C shell uses a percent sign (%). The two shells differ in the features they provide
and in the syntax of their programming constructs. However, they are fundamentally
very similar. In this book, we use the Bourne shell.
Your prompt may be different from either of these traditional prompts. This is
because the UNIX environment can be customized and the prompt may have been
changed by your system administrator. Whatever the prompt looks like, when it
appears, the system is ready for you to enter a command.
When you type a command from the keyboard, the characters are echoed on the
screen. The shell does not interpret the command until you press the RETURN key.
This means that you can use the erase character (usually the DEL or BACKSPACE key)
to correct typing mistakes. After you have entered a command line, the shell tries to
identify and locate the program specified on the command line. If the command line
that you entered is not valid, then an error message is returned.
When a program is invoked and processing begun, the output it produces is sent
to your screen, unless otherwise directed. To interrupt and cancel a program before it
has completed, you can press the interrupt character (usually CTRL-C or the DEL key).
If the output of a command scrolls by the screen too fast, you can suspend the output by
14 0 UNIX Text Processing 0
pressing the suspend character (usually CTRL-S) and resume it by pressing the resume
character (usually CTRL-0).
Some commands invoke utilities that offer their own environment-with a com-
mand interpreter and a set of special “internal” commands. A text editor is one such
utility, the mail facility another. In both instances, you enter commands while you are
“inside” the program. In these kinds of programs, you must use a command to exit
and return to the system prompt.
The return of the system prompt signals that a command is finished and that you
can enter another command. Familiarity with the power and flexibility of the UNIX
shell is essential to working productively in the UNIX environment.
Output Redirection
Some programs do their work in silence, but most produce some kind of result, or out-
put. There are generally two types of output: the expected result-referred to as staan-
durd output-and error messages-referred to as standard error. Both types of output
are normally sent to the screen and appear to be indistinguishable. However, they can
be manipulated separately-a feature we will later put to good use.
Let’s look at some examples. The echo command is a simple command that
displays a string of text on the screen.
$ echo my name
my name
In this case, the input echo m y name is processed and its output is m y name.
The name of the command-echo-refers to a program that interprets the command-
line arguments as a literal expression that is sent to standard output. Let’s replace
echo with a different command called c a t :
$ cat my name
cat: Cannot open m y
cat: C a n n o t o p e n name
The c a t program takes its arguments to be the names of files. If these files existed,
their contents would be displayed on the screen. Because the arguments were not
filenames in this example, an error message was printed instead.
The output from a command can be sent to a file instead of the screen by using
the output redirection operator (>). In the next example, we redirect the output of the
echo command to a file named r e m i n d e r s .
$ echo C a l l home a t 3:QO > reminders
$
No output is sent to the screen, and the UNIX prompt returns when the program is fin-
ished. Now the c a t command should work because we have created a file.
$ cat reminders
C a l l home at 3:OO
The contents of both reminders and todolist are combined into do-now.
The original files remain intact.
If one of the files does not exist, an error message is printed, even though stan-
dard output is redirected:
$ rm todolist
$ cat reminders todolist > do now -
cat: todolist: not found
The files we’ve created are stored in our current working directory.
*In addition to subdirectories, the root directory can contain otherfile systems. A file system is the skeletal
structure of a directory tree, which is built on a magnetic disk before any files or directories are stored on it.
On a system containing more than one disk, or on a disk divided into several partitions, there are multiple
file systems. However, this is generally invisible to the user, because the secondary file systems are
mounted on the root directory, creating the illusion of a single file system.
16 0 UNIX Text Processing 0
On many UNIX systems, users store their files in the /usr file system. (As disk
storage has become cheaper and larger, the placement of user directories is no longer
standard. For example, on our system, /usr contains only UNIX software: user
accounts are in a separate file system called /work.)
Fred’s home directory is /usr/fred. It is the location of Fred’s account on
the system. When he logs in, his home directory is his current working directory. Your
working directory is where you are currently located and changes as you move up and
down the file system.
A pathname specifies the location of a directory or file on the UNIX file system.
An absolute pathname specifies where a file or directory is located off the root file sys-
tem. A relative pathname specifies the location of a file or directory in relation to the
current working directory.
To find out the pathname of our current directory, enter pwd.
$ pwd
/usr/fred
The absolute pathname of the current working directory is /usr/fred. The Is
command lists the contents of the current directory. Let’s list the files and subdirec-
tories in /usr/ fred by entering the 1s command with the -F option. This option
prints a slash ( / ) following the names of subdirectories. In the following example,
oldstuff is a directory, and notes and reminders are files.
$ IS -F
reminders
notes
oldstuf f/
When you specify a filename with the 1s command, it simply prints the name of
the file, if the file exists. When you specify the name of directory, it prints the names
of the files and subdirectories in that directory.
$ 1s reminders
reminders
$ 1s oldstuff
chOl-draft
letter.212
memo
In this example, a relative pathname is used to specify oldstuf f. That is, its loca-
tion is specified in relation to the current directory, /usr/fred. You could also
enter an absolute pathname, as in the following example:
$ 1s /usr/fred/oldstuff
chOl-draft
letter.212
memo
Similarly, you can use an absolute or relative pathname to change directories using the
cd command. To move from /usr/fred to /usr/fred/oldstuff,you can
enter a relative pathname:
0 UNlX Fundamentals 0 17
$ cd oldstuff
$ pwd
/usr/fred/oldstuff
You could also use this variable in pathnames to specify a file or directory in your
home directory.
$ 1s $HOME/oldstuff/memo
/usr/fred/oldstuff/memo
Permissions
Access to UNIX files is governed by ownership and permissions. If you create a file,
you are the owner of the file and can set the permissions for that file to give or deny
access to other users of the system. There are three different levels of permission:
Thus, you can set read, write, and execute permissions for the three levels of own-
ership. This can be represented as:
rwxrwxrwx
I I \
owner group other
0 UNIX Fundamentals 0 19
When you enter the command Is -1, information about the status of the file is
displayed on the screen. You can determine what the file permissions are, who the
owner of the file is, and with what group the file is associated.
$ 1s -1 meet.306
-rw-rw-r-- 1 fred techpubs 126 March 6 10:32 meet.306
This file has read and write permissions set for the user f r e d and the group
techpubs. All others can read the file, but they cannot modify it. Because f r e d is
the owner of the file, he can change the permissions, making it available to others or
denying them access to it. The chmod command is used to set permissions. For
instance, if he wanted to make the file writeable by everyone, he would enter:
$ chmod o+w meet.306
$ 1s -1 meet.306
-rw-rw-rw- 1 fred techpubs 126 March 6 10:32 meet.306
This translates to “add write permission (+w)to others (o).” If he wanted to remove
write permission from a file, keeping anyone but himself from accidentally modifying a
finished document, he might enter:
$ chmod go-w meet.306
$ 1s -1 meet.306
-rw-r--r-- 1 fred techpubs 126 March 6 10:32 meet.306
This command removes write permission (-w) from group (9) and other ( 0 ) .
File permissions are important in UNIX, especially when you start using a text
editor to create and modify files. They can be used to protect information you have on
the system.
Special Characters
As part of the shell environment, there are a few special characters (metacharacters) that
make working in UNIX much easier. We won’t review all the special characters, but
enough of them to make sure you see how useful they are.
The asterisk (*) and the question mark (?) are filename generation metacharac-
ters. The asterisk matches any or all characters in a string. By itself, the asterisk
expands to all the names in the specified directory.
$ echo *
meet.306 oldstuff reports
In this example, the echo command displays in a row the names of a11 the files and
directories in the current directory. The asterisk can also be used as a shorthand nota-
tion for specifying one or more files.
$1s meet*
meet. 306
$ 1s /work/textp/ch*
/work/textp/chOl
/work/textp/ch02
1
20 0 UNlX Text Processing 0
/work/textp/ch03
/work/textp/chapter -make
Besides filename metacharacters, there are other characters that have special meaning
when placed in a command line. The semicolon (;) separates multiple commands on
the same command line. Each command is executed in sequence from left to right, one
before the other.
$cd o1dstuff;pwd;ls
/usr/fred/oldstuff
chOl-draft
letter.212
memo
notes
Another special character is the ampersand (&). The ampersand signifies that a com-
mand should be processed in the background, meaning that the shell does not wait for
the program to finish before returning a system prompt. When a program takes a signi-
ficant amount of processing time, it is best to have it run in the background so that you
can do other work at your terminal in the meantime. We will demonstrate background
processing in Chapter 4 when we look at the nrof f /t rof f text formatter.
Environment Variables
The shell stores useful information about who you are and what you are doing in
environment variables. Entering the s e t command will display a list of the environ-
ment variables that are currently defined in your account.
$ set
PATH .:bin:/usr/bin:/usr/local/bin:/etc
argv 0
cwd /work/textp/ch03
home /usr/fred
shell /bin/sh
status 0
TERM wy50
These variables can be accessed from the command line by prefacing their name with a
dollar sign:
$ echo $TERM
wy50
The TERM variable identifies what type of terminal you are using. It is important that
you correctly define the TERM environment variable, especially because the v i text
0 UNIX Fundamentals 0 21
editor relies upon it. Shell variables can be reassigned from the command line. Some
variables, such as TERM, need to be exported if they are reassigned, so that they are
available to all shell processes.
$ TERM=tvi925; export TERM Tell UNIX I ' m using a Televideo 925
You can also define your own environment variables for use in commands.
$ friends="alice ed ralph"
$ echo $friends
alice ed ralph
This command sends the mail message to three people whose names are defined in the
friends environment variable. Pathnames can also be assigned to environment vari-
ables, shortening the amount of typing:
$ pwd
/usr/fr e d
$ book="/work/textp"
$ cd $book
$ pwd
/work/textp
might produce:
10 10 72
Because all programs expect-and produce-nly a data stream, that data stream can
easily be processed by multiple programs in sequence.
One of the most common uses of filters is to process output from a command.
Usually, the processing modifies it by rearranging it or reducing the amount of informa-
tion it displays. For example:
$ who List who is on the system, and at which terminal
peter ttyOOl Mar 6 1 7 : 1 2
Walter tty003 Mar 6 13:51
Chris tty004 Mar 6 15:53
Val tty020 Mar 6 15:48
tim tty005 Mar 4 17:23
ruth tty006 Mar 6 17:02
fred ttyOOO Mar 6 10:34
dale tty008 Mar 6 15:26
$ who I sort List the same information in alphabetic order
Chris ttyOO4 Mar 6 15:53
dale ttyO08 Mar 6 15:26
fred ttyOOO Mar 6 10:34
peter ttyOOl Mar 6 17:12
ruth tty006 Mar 6 17:02
t im tty005 Mar 4 17:23
val tty020 Mar 6 15:48
Walter tty003 Mar 6 13:51
$
One of the beauties of UNIX is that almost any program can be used to filter the output
of any other. The pipe is the master key to building command sequences that go
beyond the capabilities provided by a single program and allow users to create custom
“programs” of their own to meet specific needs.
If a command line gets too long to fit on a single screen line, simply type a
backslash followed by a carriage return, or (if a pipe symbol comes at the appropriate
place) a pipe symbol followed by a carriage return. Instead of executing the command,
the shell will give you a secondary prompt (usually >) so you can continue the line:
$ echo This is a long line shown here as a demonstration I
> wc
1 10 49
This feature works in the Bourne shell only.
0 UNIX Fundamentals 0 23
Shell Scripts 1
A shell script is a file that contains a sequence of UNIX commands. Part of the flexi-
bility of UNIX is that anything you enter from the terminal can be put in a file and exe-
cuted. To give a simple example, we’ll assume that the last command example (grep)
has been stored in a file called whoison:
$ cat whoison
who I grep t t y O O l
The permissions on this file must be changed to make it executable. After a file
is made executable, its name can be entered as a command.
$ chmod + xwhoison
$ 1s -1 whoison
-rwxrwxr-x 1 fred doc 123 Mar 6 17:34 whois
$ whoison
peter ttyOOl Mar 6 17:12
Shell scripts can do more than simply function as a batch command facility. The basic
constructs of a programming language are available for use in a shell script, allowing
users to perform a variety of complicated tasks with relatively simple programs.
The simple shell script shown above is not very useful because it is too specific.
However, instead of specifying the name of a single terminal line in the file, we can
read the name as an argument on the command line. In a shell script, $ 1 represents
the first argument on the command line.
$ cat whoison
who I grep $1
Now we can find who is logged on to any terminal:
$ whoison t t y O O 4
Chris tty004 Mar 6 15:53
Later in this book, we will look at shell scripts in detail. They are an important part of
the writer’s toolbox, because they provide the “glue” for users of the UNIX system-
the mechanism by which all the other tools can be made to work together.
C H A P T E R
rn
Learning vi
UNIX has a number of editors that can process the contents of readable files, whether
those files contain data, source code, or text. There are line editors, such as ed and
ex, which display a line of the file on the screen, and there are screen editors, such as
v i and emacs, which display a part of the file on your terminal screen.
The most useful standard text editor on your system is vi. Unlike emacs, it is
available in nearly identical form on almost every UNIX system, thus providing a kind
of text editing linguafranca. The same might be said of ed and ex, but screen edi-
tors are generally much easier to use. With a screen editor you can scroll the page,
move the cursor, delete lines, insert characters, and more, while seeing the results of
your edits as you make them. Screen editors are very popular because they allow you
to make changes as you read a file, much as you would edit a printed copy, only faster.
To many beginners, v i looks unintuitive and cumbersome-instead of letting
you type normally and use special control keys for word-processing functions, it uses all
of the regular keyboard keys for issuing commands. You must be in a special insert
mode before you can type. In addition, there seem to be so many commands.
You can’t learn vi by memorizing every single vi command. Begin by learn-
ing some basic commands. As you do, be aware of the patterns of usage that com-
mands have in common. Be on the lookout for new ways to perform tasks, experiment-
ing with new commands and combinations of commands.
As you become more familiar with vi, you will find that you need fewer key-
strokes to tell v i what to do. You will learn shortcuts that transfer more and more of
the editing work to the computer-where it belongs. Not as much memorization is
required as first appears from a list of vi commands. Like any skill, the more editing
you do, the more you know about it and the more you can accomplish.
This chapter has three sections, and each one corresponds to a set of material
about vi that you should be able to tackle in a single session. After you have finished
each session, put aside the book for a while and do some experimenting. When you
feel comfortable with what you have learned, continue to the next session.
24
0 Learning vi 0 25
You can use v i to edit any file that contains readable text, whether it is a report, a
series of shell commands, or a program. The v i editor copies the file to be edited into
a buffer (an area temporarily set aside in memory), displays as much of the buffer as
possible on the screen, and lets you add, delete, and move text. When you save your
edits, v i copies the buffer into a permanent file, overwriting the contents of the old
file.
Opening a File
The syntax for the v i command is:
vi filename]
wherefilename is the name of either an existing file or a new file. I f you don’t specify
a filename, v i will open an unnamed buffer, and ask you to name it before you can
save any edits you have made. Press RETURN to execute the command.
A filename must be unique inside its directory. On AT&T (System V) UNIX sys-
tems, it cannot exceed 14 characters. (Berkeley UNIX systems allow longer filenames.)
A filename can include any ASCII character except /, which is reserved as the separa-
tor between files and directories in a pathname. You can even include spaces in a
filename by “escaping” them with a backslash. In practice, though, filenames consist
of any combination o f uppercase and lowercase letters, numbers, and the characters .
(dot) and (underscore). Remember that UNIX is case-sensitive: lowercase filenames
are distinctfrom uppercase filenames, and, by convention, lowercase is preferred.
If you want to open a new file called notes in the current directory, enter:
$ vi notes
The v i command clears the screen and displays a new buffer for you to begin work.
Because notes is a new file, the screen displays a column of rzldes (-) to indicate
that there is no text in the file, not even blank lines.
26 0 UNlX Text Processing 0
If you specify the name of a file that already exists, its contents will be displayed on the
screen. For example:
$ vi l e t t e r
The prompt line at the bottom of the screen echoes the name and size of the file.
0 Learning vi 0 27
Sometimes when you invoke vi, you may get either of the following messages:
[using open mode]
or:
Visual needs addressable cursor or upline capability
In both cases, there is a problem identifying the type of terminal you are using. You
can quit the editing session immediately by typing :q.
Although vi can run on almost any terminal, it must know what kind of terminal
you are using. The terminal type is usually set as part of the UNIX login sequence. If
you are not sure whether your terminal type is defined correctly, ask your system
administrator or an experienced user to help you set up your terminal. If you know
your terminal type (wy50 for instance), you can set your TERM environment variable
with the following command:
TERM=wy50; export TERM
vi Commands
The v i editor has two modes: command mode and insert mode. Unlike many word
processors, vi’s command mode is the initial or default mode. To insert lines of text,
you must give a command to enter insert mode and then type away.
Most commands consist of one or two characters. For example:
i insert
C change
Using letters as commands, you can edit a file quickly. You don’t have to
memorize banks of function keys or stretch your fingers to reach awkward combinations
of keys.
In general, vi commands
There is also a special group of commands that echo on the bottom line of the
screen. Bottom-line commands are indicated by special symbols. The slash ( / ) and the
question mark (?) begin search commands, which are discussed in session 2. A colon
( :) indicates an ex command. You are introduced to one ex command (to quit a file
without saving edits) in this chapter, and the ex line editor is discussed in detail in
Chapter 7.
To tell vi that you want to begin insert mode, press i. Nothing appears on the
screen, but you can now type any text at the cursor. To tell v i to stop inserting text,
press ESC and you will return to command mode.
28 0 UNIX Text Processing 0
For example, suppose that you want to insert the word introduction. If you type
the keystrokes iintroduction, what appears on the screen is
introduction
Because you are starting out in command mode, v i interprets the first keystroke (i)as
the insert command. All keystrokes after that result in characters placed in the file,
until you press ESC. If you need to correct a mistake while in insert mode, backspace
and type over the error.
While you are inserting text, press RETURN to break the lines before the right
margin. An autowrap option provides a carriage return automatically after you exceed
the right margin. To move the right margin in ten spaces, for example, enter :set
wm=lO.
Sometimes you may not know i f you are in insert mode or command mode.
Whenever vi does not respond as you expect, press ESC. When you hear a beep, you
are in command mode.
Saving a File
You can quit working on a file at any time, save the edits, and return to the UNIX
prompt. The vi command to quit and save edits is ZZ. (Note that Z Z is capital-
ized.)
Let’s assume that you create a file called letter to practice vi commands
and that you type in 36 lines o f text. To save the file, first check that you are in com-
mand mode by pressing ESC, and then give the write and save command, ZZ. Your
file is saved as a regular file. The result is:
“letter” [New file] 36 lines, 1331 characters
You return to the UNIX prompt. I f you check the list of files in the directory, by typ-
ing Is at the prompt, the new file is listed:
$ Is
chOl ch02 letter
You now know enough to create a new file. As an exercise, create a file called
letter and insert the text shown in Figure 3-1. When you have finished, type Z Z to
save the file and return to the UNIX prompt.
April 1, 1987
Sincerely,
Fred Caslon
To move the cursor, make sure you are in command mode by pressing ESC. Give the
command for moving forward or backward in the file from the current cursor position.
When you have gone as far in one direction as possible, you’ll hear a beep and the cur-
sor stops. You cannot move the cursor past the tildes (-) at the end of the file.
Single Movements
The keys h, j, k, and 1,right under your fingertips, will move the cursor:
left one space
down one line
up one line
right one space
You could use the cursor arrow keys (t, &, +, t)or the RETURN and BACK-
SPACE keys, but they are out of the way and are not supported on all terminals.
You can also combine the h, j , k, and 1 keys with numeric arguments and
other v i commands.
Numeric Arguments
You can precede movement commands with numbers. The command 4 1 moves the
cursor (shown as a small box around a letter) four spaces to the right, just like typing
the letter 1 four times (1111).
move right
4 characters
This one concept (being able to multiply commands) gives you more options (and
power) for each command. Keep i t in mind as you are introduced to additional com-
mands.
Movement by Lines
When you saved the file letter, the editor displayed a message telling you how
many lines were in that file. A line in the file is not necessarily the same length as a
0 Learning vi 0 31
physical line (limited to 80 characters) that appears on the screen. A line is any text
entered between carriage returns. If you type 200 characters before pressing RETURN,
v i regards all 200 characters as a single line (even though those 200 characters look
like several physical lines on the screen).
Two useful commands in line movement are:
0 <zero> move to beginning of line
$ move to end of line
In the following file, the line numbers are shown. To get line numbers on your screen,
enter : s e t nu.
1 W i t h t h e s c r e e n e d i t o r you c a n s c r o l l t h e page,
2 move t h e c u r s o r , d e l e t e l i n e s , a n d i n s e r t c h a r a c t e r s ,
w h i l e s e e i n g t h e r e s u l t s o f e d i t s a s y o u make t h e m .
3 Screen editors a r e v e r y popular.
The number of logical lines (3) does not correspond to the number of physical lines (4)
that you see on the screen. If you enter $, with the cursor positioned on the d in the
word delete, the cursor would move to the period following the word them.
1 With t h e s c r e e n editor you can scroll t h e page,
2 move t h e c u r s o r , d e l e t e l i n e s , a n d i n s e r t c h a r a c t e r s ,
w h i l e s e e i n g t h e r e s u l t s o f e d i t s a s y o u make them,
3 S c r e e n e d i t o r s a r e very p o p u l a r .
If you enter 0 (zero), the cursor would move back to the letter t in the word the, at the
beginning of the line.
1 With t h e s c r e e n editor you can scroll t h e page,
2 move t h e c u r s o r , d e l e t e l i n e s , a n d i n s e r t c h a r a c t e r s ,
w h i l e s e e i n g t h e r e s u l t s o f e d i t s a s y o u make t h e m .
3 S c r e e n e d i t o r s a r e very p o p u l a r .
You can also move forward one word at a time, ignoring symbols and punctuation
marks, using the command W (note the uppercase W). It causes the cursor to move to
the first character following a blank space. Cursor movement using W looks like this:
move t h e c u r s o r , delete l i n e s , and i n s e r t characters,
32 0 UNlX Text Processing 0
To move backward one word at a time, use the command b. The B command allows
you to move backward one word at a time, ignoring punctuation.
With either the w, W, b, or B commands, you can multiply the movement with
numbers. For example, 2w moves forward two words; 5 B moves back five words,
ignoring punctuation. Practice using the cursor movement commands, combining them
with numeric multipliers.
Simple Edits
When you enter text in your file, i t is rarely perfect. You find errors or want to
improve a phrase. After you enter text, you have to be able to change it.
What are the components of editing? You want to insert text (a forgotten word or
a missing sentence). And you want to delete text (a stray character or an entire para-
graph). You also need to change letters and words (correct misspellings or reflect a
change of mind). You want to move text from one place to another part of your file.
And on occasion, you want to copy text to duplicate it in another part of your file.
There are four basic edit commands: i for insert (which you have already seen),
c for change, d for delete, d then p for move (delete and put), and y for yank
(copy). Each type of edit is described in this section. Table 3-1 gives a few simple
examples.
1 1’11 s t a r t p u t t i n g
together a written
p l a n t h a t shows
d i f f e r e n t strategies
3k
moveup3
lines
I‘ll start p u t t i n g
together a written
p l a n t h a t shows
t
~
In the previous example, v i moves existing text to the right as the new text is inserted.
That is because we are showing v i on an “intelligent” terminal, which can adjust the
screen with each character you type. An insert on a “dumb” terminal (such as an
adm3a) will look different. The terminal itself cannot update the screen for each char-
acter typed (without a tremendous sacrifice of speed), so v i doesn’t rewrite the screen
until after you press ESC. Rather, when you type, the dumb terminal appears to
overwrite the existing text. When you press ESC, the line i s adjusted immediately so
that the missing characters reappear. Thus, on a dumb terminal, the same insert would
appear as follows:
Changing Text
You can replace any text in your file with the change command, c. To identify the
amount of text that you want replaced, combine the change command with a movement
command. For example, c can be used to change text from the cursor
cw to the end of a word
2cb back two words
CS to the end of a line
Then you can replace the identified text with any amount of new text: no characters at
all, one word, or hundreds of lines. The c command leaves you in insert mode until
you press the ESC key.
Words
You can replace a word ( c w ) with a longer word, a shorter word, or any amount of text.
The c w command can be thought of as “delete the word marked and insert new text
until ESC is pressed.”
Suppose that you have the following lines in your file letter and want to
change designing to putting together. You only need to change one word.
Note that the c w command places a $ at the last character of the word to be changed.
You don’t need to replace the specified number of words, characters, or lines with a like
amount of text. For example:
-
0 Learning v i 35
Lines
To replace the entire current line, there is the special change command cc. This com-
mand changes an entire line, replacing that line with the text entered before an ESC.
The cc command replaces the entire line of text, regardless of where the cursor i s
located on the line.
The C command replaces characters from the current cursor position to the end
of the line. It has the same effect as combining c with the special end-of-line indica-
tor, $ (as in cS).
Characters
One other replacement edit is performed with the r command. This command replaces
a single character with another single character. One of its uses is to correct misspel-
lings. You probably don't want to use c w in such an instance, because you would
have to retype the entire word. Use r to replace a single character at the cursor:
The r command makes only a single character replacement. You do not have to press
ESC to finish the edit. Following an r command, you are automatically returned to
command mode.
Deleting Text
You can also delete any text in your file with the delete command, d. Like the change
command, the delete command requires an argument (the amount of text to be operated
on). You can delete by word (dw), by line (dd and D), or by other movement com-
mands that you will learn later.
With all deletions, you move to where you want the edit to take place and enter
the delete command (d) followed by the amount of text to be deleted (such as a text
object, w for word).
36 0 UNlX Text Processing 0
Words
Suppose that in the following text you want to delete one instance of the word srurt in
the first line.
The dw command deletes from the cursor's position to the end of a word. Thus, d w
can be used to delete a portion of a word.
that+h shows d i f f e r e n t
II I
thatshows d i f f e r e n t
1
I
dw
delete word
As you can see, d w deleted not only the remainder of the word, but also the space
before any subsequent word on the same line. To retain the space between words, use
de, which will delete only to the end of the word.
word end
You can also delete backwards (db) or to the end or beginning of a line (dS or do).
Lines
The dd command deletes the entire line that the cursor is on. Using the same text as
in the previous example, with the cursor positioned on the first line as shown, you can
delete the first two lines:
0 Learning v i 0 37
If you are using a dumb terminal or one working at less than 1200 baud, line deletions
look different. The dumb or slow terminal will not redraw the screen until you scroll
past the bottom of the screen. Instead the deletion appears as:
An @ symbol “holds the place” of the deleted line, until the terminal redraws the
entire screen. (You can force v i to redraw the screen immediately by pressing either
CTRL-L or CTRL-R, depending on the terminal you’re using.)
The D command deletes from the cursor position to the end of the line:
Today, I ‘ l l s t a r t Today, I ’ l l s t a r t
putting together a putting together a
written plan delete to written plan that
t h a t shows d i f f e r e n t end of line that-
Characters
Often, while editing a file, you want to delete a single character or two. Just as r
changes one character, x deletes a single character. The x command deletes any char-
acter the cursor is on. In the following line, you can delete the letter 1 by pressing x.
The X command deletes the character before the cursor. Prefix either of these com-
mands with a number to delete that number of characters. For example, 5 X will delete
the five characters to the left of the cursor.
Moving Text
You can move text by deleting it and then placing that deleted text elsewhere in the file,
like a “cut and paste.” Each time you delete a text block, that deletion is temporarily
saved in a buffer. You can move to another position in the file and use the put com-
mand to place the text in a new position. Although you can move any block of text,
this command sequence is more useful with lines than with words.
The put command, p, places saved or deleted text (in the buffer) after the cursor
position. The uppercase version of the command, P, puts the text hefore the cursor. If
you delete one or more lines, p puts the deleted text on a new line(s) below the cursor.
If you delete a word, p puts the deleted text on the same line after the cursor.
Suppose that in your file 1e t t e r you have the following lines and you want to
move the fourth line of text. Using delete, you can make this edit. First delete the line
in question:
~ ~~
Then use p to restore the deleted line at the next line below the cursor:
You can also use xp (delete character and put after cursor) to transpose two letters.
For example, in the word mvoe, the letters Y O are transposed (reversed). To correct this,
place the cursor on v and press x then p.
After you delete the text, you must restore it before the next change or delete
command. If you make another edit that affects the buffer, your deleted text will be
lost. You can repeat the put command over and over, as long as you don’t make a new
edit. In the advanced v i chapter, you will learn how to retrieve text from named and
numbered buffers.
~
0 Learning vi 0 39
Copying Text
Often, you can save editing time (and keystrokes) by copying part of your file to
another place. You can copy any amount of existing text and place that copied text
elsewhere in the file with the two commands y (yank) and p (put). The yank com-
mand is used to get a copy of text into the buffer without altering the original text.
This copy can then be placed elsewhere in the file with the put command.
Yank can be combined with any movement command (for example, yw, y $ . or
4yy). Yank is most frequently used with a line (or more) of text, because to yank and
put a word generally takes longer than simply inserting the word. For example, to yank
five lines of text:
To place the yanked text, move the cursor to where you want to put the text, and
use the p command to insert it below the current line, or P to insert it above the
current line.
5 more lines
The yanked text will appear on the line below the cursor. Deleting uses the same buffer
as yanking. Delete and put can be used in much the same way as yank and put. Each
new deletion or yank replaces the previous contents of the yank buffer. As we’ll see
later, up to nine previous yanks or deletions can be recalled with put commands.
40 0 UNlX Text Processing 0
Yesterday, I received
t h e product demo. the product demo.
other materials repeat last
command (dd)
In some versions of v i , the command CTRL-@ ("e) repeats the last insert (or
append) command. This is in contrast to the command, which repeats the last com-
mand that changed the text, including delete or change commands.
You can also undo your last command if you make an error. To undo a com-
mand, the cursor can be anywhere on the screen. Simply press u to undo the last com-
mand (such as an insertion or deletion).
To continue the previous example:
Yesterday, I received
Lhe product demo. U the product demo.
undo last Qther materials
command
The uppercase version of u (U) undoes all edits on a single line, as long as the cursor
remains on that line. After you move off a line, you can no longer use U.
~
0 Learning v i 0 41
Yesterday.
- Yesterday,
- I received
I received J t h e product demo.
the product demo. join lines
The q ! command quits the file you are in. All edits made since the last time you
saved the file are lost.
You can get by in v i using only the commands you have learned in this session.
However, to harness the real power of v i (and increase your own productivity) you
will want to continue to the next session.
screens;
text blocks;
searches for patterns;
lines.
Movement by Screens
When you read a book you think of “places” in the book by page: the page where you
stopped reading or the page number in an index. Some v i files take up only a few
lines, and you can see the whole file at once. But many files have hundreds of lines.
You can think of a v i file as text on a long roll of paper. The screen is a win-
dow of (usually) 24 lines of text on that long roll. In insert mode, as you fill up the
screen with text, you will end up typing on the bottom line of the screen. When you
reach the end and press RETURN, the top line rolls out of sight, and a blank line for
new text appears on the bottom of the screen. This is called scrolling. You can move
through a file by scrolling the screen ahead or back to see any text in the file.
There are also commands to scroll the screen up one line ("E) and down one line ("Y).
(These commands are not available on small systems, such as the PDP-11 or Xenix for
the PC-XT.)
The H command moves the cursor from anywhere on the screen to the first, or home,
line. The M command moves to the middle line, L to the last. To move to the line
below the first line, use 2 H .
These screen movement commands can also be used for editing. For example, dH
deletes to the top line shown on the screen.
44 0 UNlX Text Processing 0
~~ ~~
The command moves to the first character of the line, ignoring any spaces or tabs.
A
( 0 , by contrast, moves to the first position of the line, even if that position is blank.)
____~
In our conversation -
In our conversation
last Thursday, we . .. I last Thursday, w e ...
go to S t a n
Going through a demo of previous Going through a demo
paragraph session gave me --.
Learning v i 45
Most people find it easier to visualize moving ahead, so the forward commands
are generally more useful.
Remember that you can combine numbers with movement. For example, 3 )
moves ahead three sentences. Also remember that you can edit using movement com-
mands: d) deletes to the end of the current sentence, 2 y } copies (yanks) two para-
graphs ahead.
. Movement by Searches
One of the most useful ways to move around quickly in a large file is by searching for
text, or, more properly, for a pattern of characters. The pattern can include a “wild-
card” shorthand that lets you match more than one character. For example, you can
search for a misspelled word or each occurrence of a variable in a program.
The search command i s the slash character (/). When you enter a slash, it
appears on the bottom line of the screen; then type in the pattern (a word or other string
of characters) that you want to find:
/text<RETURN> search forward for text
A space before or after text will be included in the search. As with all bottom-line com-
mands, press RETURN to finish.
The search begins at the cursor and moves forward, wrapping around to the start
of the file if necessary. The cursor will move to the first occurrence of the pattern (or
the message “Pattern not found” will be shown on the status line if there is no match).
If you wanted to search for the pattern shows:
.-, -
.-,
/th
46 0 UNlX Text Processing 0
The search proceeds forward from the present position in the file. You can give any
combination of characters; a search does not have to be for a complete word.
You can also search backwards using the ? command:
?text<RETURN> search backward for text
The last pattern that you searched for remains available throughout your editing
session. After a search, instead of repeating your original keystrokes, you can use a
command to search again for the last pattern.
n repeat search in same direction
N repeat search in opposite direction
/ <RET URN > repeat search in forward direction
? <RE TURN > repeat search in backward direction
Because the last pattern remains available, you can search for a pattern, do some
work, and then search again for the pattern without retyping by using n, N, /, or ?.
The direction of your search (/=forwards, ?=backwards) is displayed at the bottom left
of the screen.
Continuing the previous example, the pattern th is still available to search for:
This section has given only the barest introduction to searching for patterns. Chapter 7
will teach more about pattern matching and its use in making global changes to a file.
I
Today, I‘ll s t a r t Today, Iill s t a r t
f‘
find first ’
in line
Use d f ‘ to delete up to and including the named character (in this instance ’). This
=
command is useful in deleting or copying partial lines.
The t command works just like f , except it positions the cursor just before the
character searched for. As with f and b. a numeric prefix will locate the nth
occurrence. For example:
If you are going to move by line numbers, you need a way to identify line
numbers. Line numbers can be displayed on the screen using the :set nu option
described in Chapter 7. In v i , you can also display the current line number on the
bottom of the screen.
The command "G displays the following on the bottom of your screen: the
current line number, the total number of lines in the file, and what percentage of the
total the present line number represents. For example, for the file letter, "G might
display:
'.letter" line 10 of 4 0 - - 2 5 % - -
^G is used to display the line number to use in a command, or to orient yourself if you
have been distracted from your editing session.
The G (go to) command uses a line number as a numeric argument, and moves to
the first position on that line. For instance, 4 4 G moves the cursor to the beginning of
line 44. The G command without a line number moves the cursor to the last line of the
file.
Two single quotes ( ' ') return you to the beginning of the line you were origi-
nally on. Two backquotes (' ' ) return you to your original position exactly. If you
* .
have issued a search command ( / or ?), will return the cursor to its position when
you started the search.
The total number of lines shown with "G can be used to give yourself a rough
idea of how many lines to move. If you are on line 10 of a 1000-line file:
"chOl" line 1 0 of 1 0 0 0 --1%--
and know that you want to begin editing near the end of that file, you could give an
approximation of your destination with:
800G
Movement by line number can get you around quickly in a large file.
Command-Line Options
There are other options to the v i command that can be helpful. You can open a file
directly to a specific line number or pattern. You can also open a file in read-only
mode. Another option recovers all changes to a file that you were editing when the sys-
tem crashes.
7
Today I'll s t a r t p u t t i n g t o g e t h e r a
written plan that presents the different
strategies for the Blcuin
-
There can be no spaces in the pattern because characters after a space are interpreted as
filenames.
If you have to leave an editing session before you are finished, you can mark your
place by inserting a pattern such as Z Z Z or HERE. Then when you return to the file,
all you have to remember is / Z Z Z or /HERE.
50 0 UNlX Text Processing 0
Read-only Mode
There will be times that you want to look at a file, but you want to protect that file from
inadvertent keystrokes and changes. (You might want to call in a lengthy file to prac-
tice v i movements, or you might want to scroll through a command file or program.)
If you enter a file in read-only mode, you can use all the v i movement commands, but
you cannot change the file with any edits. To look at your file l e t t e r in read-only
mode, you can enter either:
$ v i -R letter
or:
$ view l e t t e r
Recovering a Buffer
Occasionally, there will be a system failure while you are editing a file. Ordinarily, any
edits made after your last write (save) are lost. However, there is an option, -r, which
lets you recover the edited buffer at the time of a system crash. (A system program
called p r e s e r v e saves the buffer as the system is going down.)
When you first log in after the system is running again, you will receive a mail
message stating that your buffer is saved. The first time that you call in the file, use the
-r option to recover the edited buffer. For example, to recover the edited buffer of the
file l e t t e r after a system crash, enter:
$ vi -r l e t t e r
If you first call in the file without using the -r option, your buffered edits are lost.
You can force the system to preserve your buffer even when there i s not a crash
by using the command :pre. You may find this useful if you have made edits to a
file, then discover you can’t save your edits because you don’t have write permission.
(You could also just write a copy of the file out under another name or in a directory
where you do have write permission.)
Customizing vi
A number of options that you can set as part of your editing environment affect how
v i operates. For example, you can set a right margin that will cause v i to wrap lines
automatically, so you don’t need to insert carriage returns.
You can change options from within v i by using the :s e t command. In addi-
tion, v i reads an initialization file in your home directory called . e x r c for further
operating instructions. By placing s e t commands in this file, you can modify the
way v i acts whenever you use it.
You can also set up .e x r c files in local directories to initialize various options
that you want to use in different environments. For example, you might define one set
of options for editing text, but another set for editing source programs. The . e x r c
file in your home directory will be executed first, then the one on your current direc-
tory.
0 Learning vi 0 51
For example, to specify that pattern searches should ignore case, you type:
:set ic
Some options have values. For example, the option window sets the number of
lines shown in the screen “window.” You set values for these options with an equals
sign (=). For example:
set window=20
During a v i session, you can check what options are available. The command:
:set all
displays the complete list of options, including options that you have set and defaults
that v i has chosen. The display will look something like this:
\
You can also ask about the setting for any individual option by name, using the com-
mand:
:set option?
The command :s e t shows options that you have specifically changed, or set, either in
your . e x r c file or during the current session. For example, the display might look
like this:
number window=20 wrapmargin=lO
The .e x r c File
The e x r c file that controls the vi environment for you is in your home directory.
Enter into this file the s e t options that you want to have in effect whenever you use
v i or e x .
The .e x r c file can be modified with the v i editor, like any other file. A sam-
ple .e x r c file might look like this:
set wrapmargin=lO window=20
Because the file is actually read by e x before it enters visual mode (vi),commands in
.e x r c should not have a preceding colon.
Alternate Environments
You can define alternate v i environments by saving option settings in an .e x r c file
that is placed in a local directory. If you enter v i from that directory, the local
.e x r c file will be read in. If it does not exist, the one in your home directory will be
read in.
For example, you might want to have one set of options for programming:
set number lisp autoindent sw=4 tags=/usr/lib/tags terse
and another set of options for text editing:
set wrapmargin=15 ignorecase
Local . e x r c files are especially useful when you define abbreviations, which are
described in Chapter 7.
There is one option that is almost essential for editing nonprogram text. The
w r a p m a r g i n option specifies the size of the right margin that will be used to
autowrap text as you type. (This saves manually typing carriage returns.) This option
is in effect if its value is set to greater than 0. A typical value is 10 or 15:
set wrapmargin=15
There are also three options that control how v i acts in conducting a search. By
default, it differentiates between uppercase and lowercase (foo does not match Foo),
wraps around to the beginning of the file during a search (this means you can begin
your search anywhere in the file and still find all occurrences), and recognizes wildcard
characters when matching patterns. The default settings that control these options are
n o i g n o r c a s e , w r a p s c a n , and magic, respectively. To change any of these
defaults, set the opposite toggles: i g n o r e c a s e , n o w r a p s c a n , or n o m a g i c .
Another useful option is s h i f t w i d t h . This option was designed to help pro-
grammers properly indent their programs, but it can also be useful to writers. The >>
and << commands can be used to indent (or un-indent) text by s h i f t w i d t h char-
acters. The position of the cursor on the line doesn’t matter-the entire line will be
shifted. The s h i f t w i d t h option is set to 8 by default, but you can use : s e t to
change this value.
Give the >> or << command a numeric prefix to affect more than on line. For
example :
lo>>
You can also combine numbers with any of the commands in Table 3-2 to multi-
ply them. For example, 2 c ) changes the next two sentences. Although this table may
seem forbidding, experiment with combinations and try to understand the patterns.
When you find how much time and effort you can save, combinations of change and
movement keys will no longer seem obscure, but will readily come to mind.
Using Buffers
While you are editing, you have seen that your last deletion (d or x) or yank ( y ) is
saved in a buffer (a place in stored memory). You can access the contents of that buffer
and put the saved text back in your file with the put command (p or P).
The last nine deletions are stored by v i in numbered buffers. You can access
any o f these numbered buffers to restore any (or all) of the last nine deletions. You can
also place yanks (copied text) in buffers identified by fetters. You can fill up to 26
buffers (a through z) with yanked text and restore that text with a put command any
time in your editing session.
0 Learning vi 0 55
The v i program also saves your last edit command (insert, change, delete, or
yank) in a buffer. Your last command is available to repeat or undo with a single key-
stroke.
Recovering Deletions
Being able to delete large blocks of text at a single bound i s all well and good, but what
if you mistakenly delete 53 lines that you need? There is a way to recover any of your
past nine deletions, which are saved in numbered buffers. The last deletion is saved in
buffer 1 ; the second-to-last in buffer 2, and so on.
To recover a deletion, type (quotation mark), identify the buffered text by
number, and then give the put command. For example, to recover your second-to-last
deletion from buffer 2, type:
"2p
Sometimes it's hard to remember what's in the last nine buffers. Here's a trick
that can help.
The . command (repeat last command) has a special meaning when used with p
and u. The p command will print the last deletion or change, but 2p will print the
last two. By combining p, . (dot), and u (undo), you can step back through the
numbered buffers.
The l l l pcommand will put the last deletion, now stored in buffer 1 , back into
your text. If you then type u, it will go away. But when you type the . command,
instead of repeating the last command ("lp), it will show the next buffer as if you'd
typed 'I2p. You can thus step back through the buffers. For example, the sequence:
I, 1pu.u.u.u.u.
will show you, in sequence, the contents of the last six numbered buffers.
After loading the named buffers and moving to the new position, use p or P to
put the text back.
" dP put buffer d before cursor
" ap put buffer a after cursor
56 0 UNlX Text Processing 0
-
In our conversation last In o u r conversation last
-
Thursday, we discussed a "a6yy Thursday, we discussed a
documentation project yank 6 lines documentation project
that would produce a to buffer Q that would produce a
user's manual on t h e user's manual on t h e
Alcuin product. Alcuin product.
6 lines yanked
Blcuin product -
I 'lap
Alcuin product.
-
In our conversation last
I put buffer (I
after cursor
Thursday, w e discussed a
documentation project
that would produce a
user's manual on t h e
Alcuin product.
There is no way to put part of a buffer into the text-it is all or nothing.
Named buffers allow you to make other edits before placing the buffer with p.
After you know how to travel between files without leaving v i , you can use named
buffers to selectively transfer text between files.
You can also delete text into named buffers, using much the same procedure. For
example:
"a5dd delete five lines into buffer a
If you specify the buffer name with a capital latter, yanked or deleted text will be
appended to the current contents of the buffer. For example:
"bYY yank current line into buffer h
"B5dd delete five lines and append to buffer h
3) move down three paragraphs
"bP insert the six lines from buffer b above the cursor
When you put text from a named buffer, a copy still remains in that buffer; you can
repeat the put as often as you like until you quit your editing session or replace the text
in the buffer.
For example, suppose you were preparing a document with some repetitive ele-
ments, such as the skeleton for each page of the reference section in a manual. You
could store the skeleton in a named buffer, put it into your file, fill in the blanks, then
put the skeleton in again each time you need it.
0 Learning v i 0 57
Sincerely,
putting together a mx G
written plan that mark and move
to end of file
Fred C a s l o n
Ered Caslon
Place markers are set only during the current v i session; they are not stored in the file.
nrof f and t r o f f
The v i editor lets you edit text, but it is not much good at formatting. A text file such
as program source code might be formatted with a simple program like pr, which
inserts a header at the top of every page and handles pagination, but otherwise prints the
document exactly as it appears in the file. But for any application requiring the
preparation of neatly formatted text, you will use the n r o f f (“en-roff”) or t r o f f
(“tee-roff”) formatting program.
These programs are used to process an input text file, usually coded or “marked
up” with formatting instructions. When you use a wysiwyg program like most word
processors, you use commands to lay out the text on the screen as it will be laid out on
the page. With a markup language like that used by n ro f f and t r o f f , you enter
commands into the text that tell the formatting program what to do.
Our purpose in this chapter is twofold. We want to introduce the basic formatting
codes that you will find useful. But at the same time, we want to present them in the
context of what the formatter is doing and how it works. If you find this chapter
rough-going-especially if this is your first exposure to n r o f f /t rof f-skip ahead
to either Chapter 5 or Chapter 6 and become familiar with one of the macro packages,
m s or mm; then come back and resume this chapter. We assume that you are reading
this book because you would like more than the basics, that you intend to master the
complexities of nrof f / t r o f f . A s a result, this chapter is somewhat longer and
more complex than it would be if the book were an introductory user’s guide.
Conventions
To distinguish input text and requests shown in examples from formatter output,
we have adopted the convention of showing “page comers” around output from
n r o f f or t r o f f . Output from n r o f f is shown in the same constant-width
typeface as other examples:
58
0 n r o f f and t r o f f 0 59
Output from t r o f f is shown in the same typeface as the text, but with the size of the
type reduced by one point, unless the example calls for an explicit type size:
I I
I Here is an example of troff output. I
In representing output, compromises sometimes had to be made. For example, when
showing nrof f output, we have processed the example separately with n r o f f, and
read the results back into the source file. However, from there, they have been typeset
in a constant-width font by t r o f f. As a result, there might be slight differences from
true nroff output, particularly in line length or page size. However, the context
should always make clear just what is being demonstrated.
You set aside part of the page as the text area. This requires setting top, bot-
tom, left, and right margins.
You adjust the lines that you type so they are all approximately the same
length and fit into the designated text area.
You break the text into syntactic units such as paragraphs.
You switch to a new page when you reach the bottom of the text area.
Left to themselves, nroff or t r o f f will do only one of these tasks: they will
adjust the length of the lines in the input file so that they come out even in the output
file. To do so, they make two assumptions:
The process of filling and adjusting is intuitively obvious-we’ve all done much the
same thing manually when using a typewriter or had it done for us by a wysiwyg word
processor. However, especially when it comes to a typesetting program like t r o f f ,
there are ramifications to the process of line adjustment that are not obvious. Having a
clear idea of what is going on will be very useful later. For this reason, we’ll examine
the process in detail.
60 0 UNlX Text Processing
Line Adjustment
There are three parts to line adjustment: filling, justification, and hyphenation. Filling
is the process of making all lines of text approximately equal in length. When working
on a typewriter, you do this automatically, simply by typing a camage return when the
line is full. Most word-processing programs automatically insert a carriage return at the
end of a line, and we have seen how to set up v i to do so as well.
However, n r o f f and t r o f f ignore carriage returns in the input except in a
special “no fill” mode. They reformat the input text, collecting all input lines into
even-length output lines, stopping only when they reach a blank line or (as we shall see
shortly) a formatting instruction that tells them to stop. Lines that begin with one or
more blank spaces are not filled, but trailing blank spaces are trimmed. Extra blank
spaces between words on the input line are preserved, and the formatter adds an extra
blank space after each period, question mark, or exclamation point.
Justification is a closely related feature that should not be confused with filling.
Filling simply tries to keep lines approximately the same length; justification adjusts the
space between words so that the ends of the lines match exactly.
By default, n r o f f and t r o f f both fill and justify text. Justification implies
filling, but it is possible to have filling without justification. Let’s look at some exam-
ples. First, we’ll look at a paragraph entered in v i . Here’s a paragraph from the letter
you entered in the last chapter, modified so that it offers to prepare not just a user’s
guide for the Alcuin illuminated lettering software, but a reference manual as well. In
the course of making the changes, we’ve left a short line in the middle of the paragraph.
In our conversation last Thursday, we discussed a
documentation project that would produce a user’s guide
and reference manual
for the Alcuin product. Yesterday, I received t h e product
demo and other materials that you sent me.
As you can see, n r o f f justified the text in the first example by adding extra space
between words.
Most typewritten material is filled but not justified. In printer’s terms, it is typed
rugged right. Books, magazines, and other typeset materials, by contrast, are usually
right justified. Occasionally, you will see printed material (such as ad copy) in which
the right end of each line is justified, but the left end is ragged. It is for this reason that
we usually say that text is right or left justifzed, rather than simply justified.
When it is difficult to perform filling or justification or both because a long word
falls at the end of a line, the formatter has another trick to fall back on (one we are all
familiar with)-hyphenation.
The n r o f f and t r o f f programs perform filling, justification, and hyphena-
tion in much the same way as a human typesetter used to set cold lead type. Human
typesetters used to assemble a line of type by placing individual letters in a tray until
each line was filled. There were several options for filling as the typesetter reached the
end of the line:
If, in addition to being filled, the text was to be justified, there was one additional issue:
after the line was approximately the right length, space needed to be added between
each word so that the line length came out even.
Just like the human typesetter they replace, n r o f f and t r o f f assemble one
line of text at a time, measuring the length of the line and making adjustments to the
spacing to make the line come out even (assuming that the line is to be justified). Input
lines are collected into a temporary storage area, or hufSeer, until enough text has been
collected for a single output line. Then that line is output, and the next line collected.
It is in the process of justification that you see the first significant difference
between the two programs. The n r o f f program was designed for use with
typewriter-like printers; t r o f f was designed for use with phototypesetters.
A typewriter-style printer has characters all of the same size-an i takes up the
same amount of space as an m. (Typical widths are 1/10 or 1/12 inch per character.)
And although some printers (such as daisywheel printers) allow you to change the style
of type by changing the daisywheel or thimble, you can usually have only one typeface
at a time.
A typesetter, by contrast, uses typefaces in which each letter takes up an amount
of space proportional to its outline. The space allotted for an i is quite definitely nar-
rower than the space allotted for an m. The use of variable-width characters makes the
job of filling and justification much more difficult for t r o f f than for n r o f f .
Where n r o f f only needs to count characters, t r o f f has to add up the width of
each character as it assembles the line. (Character widths are defined by a “box”
around the character, rather than by its natural, somewhat irregular shape.)
62 0 UNlX Text Processing 0
The t r o f f program also justifies by adding space between words, but because
the variable-width fonts it uses are much more compact, it fits more on a line and gen-
erally does a much better job of justification.*
There’s another difference as well. Left to itself, n r o f f will insert only full
spaces between words-that is, it might put two spaces between one pair of words, and
three between another, to fill the line. If you call n r o f f with the -e option, it will
attempt to make all interword spaces the same size (using fractional spaces if possible).
But even then, nrof f will only succeed if the output device allows fractional spacing.
The t r o f f program always uses even interword spacing.
Here’s the same paragraph filled and justified by t r o f f:
*The very best typesetting programs have the capability to adjust the space between individual characters as
well. This process i s called kerning. SoftQuad Publishing Software in Toronto sells an enhanced version of
t r o f f called SQroff that does support kerning.
0 nroff and troff 0 63
The t r o f f program gets information about the widths of the various characters
in each font from tables stored on the system in the directory / u s r / l i b / f o n t .
These tables tell t r o f f how far to move over after it has output each character on the
line.
We’ll talk more about t r o f f later. For the moment, you should be aware that
the job of the formatting program is much more complicated when typesetting than it is
when preparing text for typewriter-style printers.
Using n r o f f
As mentioned previously, left to themselves, n r o f f and t r o f f perform only rudi-
mentary formatting. They will fill and justify the text, using a default line length of 6.5
inches, but they leave no margins, other than the implicit right margin caused by the
line length. To make this clearer, let’s look at the sample letter from the last chapter
(including the edit we made in this chapter) as it appears after formatting with n r o f f .
First, let’s look at how to invoke the formatter. The n r o f f program takes as an
argument the name of i? file to be formatted:
$ nroff l e t t e r
Alternatively, it can take standard input, allowing you to preprocess the text with some
other program before formatting it:
$ t b l report I nroff
There are numerous options to n r o f f . They are described at various points in this
book (as appropriate to the topic) and summarized in Appendix B.
One basic option is -T, which specifies the terminal (printer) type for which out-
put should be prepared. Although n r o f f output is fairly straightforward, some differ-
ences between printers can significantly affect the output. (For example, one printer
may perform underlining by backspacing and printing an underscore under each under-
lined letter, and another may do it by suppressing a newline and printing the under-
scores in a second pass over the line.) The default device is the Teletype Model 37
terminal-a fairly obsolete device. Other devices are listed in Appendix B. If you
don’t recognize any of the printers or terminals, the safest type is probably lp:
$ n r o f f -Tlp file
In examples in this book, we will leave off the -T option, but you may want to experi-
ment, and use whichever type gives the best results with your equipment.
Like most UNIX programs, n r o f f prints its results on standard output. So,
assuming that the text is stored in a file called letter, all you need to do is type:
$ nroff letter
A few moments later, you should see the results on the screen. Because the letter will
scroll by quickly, you should pipe the output of n r o f f to a paging program such as
pgor more:
64 0 UNlX Text Processing 0
$ nroff l e t t e r I pg
or out to a printer using 1 p or l p r :
$ nroff l e t t e r I lp
Usingtroff
The chief advantage of t r o f f over nrof f is that it allows different types of charac-
ter sets, or fonts, and so lets you take full advantage of the higher-quality printing avail-
able with typesetters and laser printers. There are a number of requests, useful only in
t r o f f , for specifying fonts, type sizes, and the vertical spacing between lines. Before
we describe the actual requests though, we need to look at a bit of history.
The t r o f f program was originally designed for a specific typesetter, the Wang
C/A/T. Later, it was modified to work with a wide range of output devices. We’ll dis-
cuss the original version of t r o f f (which is still in use at many sites) first, before
discussing the newer versions. The C/A/T typesetter was designed in such a way that it
could use only four fonts at one time.
(Early phototypesetters worked by projecting light through a film containing the
outline of the various characters. The film was often mounted on a wheel that rotated
to position the desired character in front of the light source as it flashed, thus photo-
graphing the character onto photographic paper or negative film. Lenses enlarged and
reduced the characters to produce various type sizes. The C/A/T typesetter had a wheel
divided into four quadrants, onto which one could mount four different typefaces.)
Typically, the four fonts were the standard (roman), bold, and italic fonts of the
same family, plus a “special” font that contained additional punctuation characters,
Greek characters (for equations), bullets, rules, and other nonstandard characters. Fig-
ure 4-1 shows the characters available in these standard fonts.
The Coming of d i t r o f f
Later, t rof f was modified to support other typesetters and, more importantly (at least
from the perspective of many readers of this book), laser printers. The later version of
t r o f f is often called d i t r o f f (for device-independent t r o f f ) , but many UNIX
systems have changed the name of the original t r o f f to o t r o f f and simply call
d i t r o f f by theoriginalname, t r o f f .
The d i t r o f f program has not been universally available because, when it was
developed, it was “unbundled” from the basic UNIX distribution and made part of a
separate product called Documenter’s Workbench or DWB. UNIX system manufactur-
ers have the option not to include this package, although increasingly, they have been
doing so. Versions of DWB are also available separately from third party vendors.
The newer version of t ro f f allows you to specify any number of different
fonts. (You can mount fonts at up to ten imaginary “positions” with .fp and can
request additional fonts by name).
0 n r o f f and t r o f f 0 65
Times Roman
abcdefghijklmnopqrstuv wxyz
ABCDEFGHIJKLMNOPQRSTUVWXYZ
1234567890
! $ % & ( ) ' ' * + - . , I :; = ? [ ] I
- - - '14 '12 3/4 fi fl "t't 80
Times Italic
abcdefghijklrnnopqrstuvwxyz
ABCDEFGHIJKLMNOPQRSTUVWXYZ
1234567890
!$%&a()" * + - . , I : ;=
- - - '14 ' / z 3 / 4 f i f E " f 't.8 0
Times Bold
abcdefghijklmnopqrstuvwxyz
ABCDEFGHIJKLMNOPQRSTUVWXYZ
1234567890
!$%&()"*+-.,/:;=?[]I
0 - - - '14 I12 3/4 fi fl "t'C80
Special Mathematical Font
There may also be different font sizes available, and there are some additional corn-
rnands for line drawing ( d i t r o f f can draw curves as well as straight lines). For the
most part, though, d i t r o f f is very similar to the original program, except in the
greater flexibility it offers to use different output devices.
One way to find out which version of t r o f f you have on your system (unless
you have a program explicitly called d i t r o f f ) is to list the contents of the directory
/usr/lib/font:
66 0 UNlX Text Processing 0
$18 -F /usr/lib/font
d e v l j/
devps/
ftB
ftI
ftR
ftS
If there are one or more subdirectories whose name begins with the letters dev, your
system is using d i t r o f f . Our system supports both d i t r o f f and o t r o f f , so
we have both a device subdirectory (for d i t r o f f ) and font files (for o t r o f f )
directly in / u s r / l i b / f o n t .
We’ll talk more about font files later. For the moment, all you need to know is
that they contain information about the widths of the characters in various fonts for a
specific output device.
Contrary to what a novice might expect, font files do not contain outlines of the
characters themselves. For a proper typesetter, character outlines reside in the typesetter
itself. All t r o f f sends out to the typesetter are character codes and size and position
information.
However, t r o f f has increasingly come to be used with laser printers, many of
which use downloadable fonts. An electronic image of each character is loaded from
the computer into the printer’s memory, typically at the start of each printing job.
There may be additional “font files” containing character outlines in this case, but
these files are used by the software that controls the printer, and have nothing to do
with t r o f f itself. In other cases, font images are stored in ROM (read-only memory)
in the printer.
If you are using a laser printer, it is important to remember that t r o f f itself has
nothing to do with the actual drawing of characters or images on the printed page. In a
case like this, t r o f f simply formats the page, using tables describing the widths of
the characters used by the printer, and generates instructions about page layout, spacing,
and so on. The actual job of driving the printer is handled by another program, gen-
erally referred to as a printer driver or t r o f f postprocessor.
To use t r o f f with such a postprocessor, you will generally need to pipe the
output of t r o f f to the postprocessor and from there to the print spooler:
. t r o f f file
$ I postprocessor I lp
If you are using the old version of t r o f f , which expects to send its output directly to
the C/mtypesetter, you need to specify the -t option, which tells t r o f f to use
standard output. If you don’t, you will get the message:
Typesetter busy.
(Of course, if by any chance you are connected to a C/A/T typesetter, you don’t need
this option. There are several other options listed in Appendix B that you may find use-
ful.) When you use d i t r o f f , on the other hand, you will need to specify the -T
command-line option that tells it what device you are using. The postprocessor will
then translate the device-independent t r o f f output into instructions for that particular
type of laser printer or typesetter. For example, at our site, we use t r o f f with an
0 nroff and t r o f f 0 67
You can print the same file on different devices, simply by changing the -T option and
the postprocessor. For example, you can print drafts on a laser printer, then switch to a
typesetter for final output without making extensive changes to your files. (To actually
direct output to different printers, you will also have to specify a printer name as an
option to the lp command. In our generic example, we simply use lp without a n y
options, assuming that the appropriate printer is connected as the default printer.)
Like all things in life, this is not always as easy as it sounds. Because the fonts
used by different output devices have different widths even when the nominal font
names and sizes are the same, pagination and line breaks may be different when you
switch from one device to another.
The job of interfacing d i t r o f f to a wide variety of output devices i s becoming
easier because of the recent development of industry-wide page description languages
like Adobe Systems’ PostScript, Xerox’s Interpress, and Imagen’s DDL. These page
description languages reside in the printer, not the host computer, and provide a device-
independent way of describing placement of characters and graphics on the page.
Rather than using a separate postprocessor for each output device, you can now
simply use a postprocessor to convert t r o f f output to the desired page description
language. For example, you can use Adobe Systems’ Transcript postprocessor (or an
equivalent postprocessor like devps from Pipeline Associates) to convert t r o f f
output to PostScript, and can then send the PostScript output to any one of a number of
typesetters or laser printers.
From this point, whenever we say t r o f f , we are generally referring to
d i t r o f f . In addition, although we will continue to discuss n r o f f as it differs from
t r o f f , our emphasis is on the more capable program. It is our opinion that the grow-
ing availability of laser printers will make t r o f f the program of choice for almost all
users in the not too distant future.
However, you can submit a document coded for t r o f f to n r o f f with entirely
reasonable results. For the most part, formatting requests that cannot be handled by
n r o f f are simply ignored. And you can submit documents coded for n r o f f to
t r o f f , though you will then be failing to use many of the characteristics that make
t r o f f desirable.
(\) . For example, the escape sequence \ 1 will draw a horizontal line. Especially in
t r o f f, escape sequences are used for line drawing or for printing various special char-
acters that do not appear in the standard ASCII character set. For instance, you enter
\ ( b u to get 0 , a bullet.
There are three classes of formatting instructions:
For the most part, we will discuss the requests used to define macros, strings, and
number registers later in this book.
At this point, we want to focus on understanding the basic requests that control
the basic actions of the formatter. We will also learn many of the most useful requests
with immediate, one-time effects. Table 4-1 summarizes the requests that you will use
most often.
default line length of 6.5 inches, but they leave no margins, other than the implicit right
margin caused by the line length.
To make this clearer, let’s look at the sample letter from the last chapter as it
appears after formatting with n r o f f, without any embedded requests, and without
using any macro package. From Figure 4-2, you can see immediately that the formatter
has adjusted all of the lines, so that they are all the same length-ven in the address
block of the letter, where we would have preferred them to be left as they were. Blank
lines in the input produce blank lines in the output, and the partial lines at the ends of
paragraphs are not adjusted.
The most noticeable aspect of the raw formatting is a little difficult to reproduce
here, though we’ve tried. No top or left margin is automatically allocated by nrof f.
April 1, 1987
Sincerely,
Fred Caslon
If you look carefully at the previous example, you will probably notice that we entered
the two formatting requests on blank lines in the letter. If we were to format the letter
now, here is what we?d get:
April 1, 1987
Mr. John Fust
Vice President, Research and Development
Gutenberg Galaxy Software
Waltham, Massachusetts 02159
Dear Mr. Fust:
As you may notice, we?ve lost the blank lines that used to separate the date from the
address block, and the address block from the salutation. Lines containing formatting
requests do not result in any space being output (unless they are spacing requests), so
you should be sure not to inadvertently replace blank lines when entering formatting
codes.
Controlling Justification
Justification can be controlled separately from filling by the .a d (adjust) request.
(However, filling must be on for justification to work at all.) You can adjust text at
either margin or at both margins.
Unlike the . b r and .nf requests introduced, . a d takes an argument, which
specifies the type of justification you want:
There is another related request, .n a (no adjust). Because the text entered in a
file is usually left justified to begin with, turning justification off entirely with - n a
produces similar results to .ad 1 in most cases.
However, there is an important difference. Normally, if no argument is given to
the .a d request, both margins will be adjusted. That is, .a d is the same as .a d b.
However, following an .na request, .a d reverts to the value last specified. That is,
the sequence:
.ad r
Some text
.ad 1
Some text
-
ad
Some texr
will adjust both margins in the third block of text. However, the sequence:
72 0 UNlX Text Processing 0
.ad r
Some text
.na
Some text
.ad
Some text
will adjust only the right margin in the third block of text.
It’s easy to see where you would use .ad b or .ad 1. Let’s suppose that
you would like a ragged margin for the body of your letter, to make it look more like it
was prepared on a typewriter. Simply follow the .f i request we entered previously
with . a d 1.
Right-only justification may seem a little harder to find a use for. Occasionally,
you’ve probably seen ragged-left copy in advertising, but that’s about it. However, if
you think for a moment, you’ll realize that it is also a good way to get a single line over
to the right margin.
For example, in our sample letter, instead of typing all those leading spaces before
the date (and having it fail to come out flush with the margin anyway), we could enter
the lines:
.ad r
April 1, 1987
.ad b
As it turns out, this construct won’t quite work. If you remember, when filling is
enabled, nro f f and t r o f f collect input in a one-line buffer and only output the
saved text when the line has been filled. There are some non-obvious consequences of
this that will ripple all through your use of nrof f and t r o f f . If you issue a
request that temporarily sets a formatting condition, then reset it before the line is out-
put, your original setting may have no effect. The result will he controlled by the
request that is in effect ut the time the line is output, not ut the time that it i s first col-
lected in the line buffer.
Certain requests cause implicit line breaks (the equivalent of carriage returns on a
typewriter) in the output, but others do not. The .ad request does not cause a break.
Therefore, a construction like:
.ad r
April 1, 1987
.ad b
Mr. John F u s t
and not:
I Mr. John F u s t -
April 1, 1987
0 nroff and t r o f f 0 73
To make sure that you get the desired result from a temporary setting like this, be sure
to follow the line to be affected with a condition that will cause a break.” For instance,
in the previous example, you would probably follow the date with a blank line or an
.sp request, either of which will normally cause a break. If you don’t, you should put
in an explicit break, as follows:
.ad r
April 1, 1 9 8 7
br-
.ad b
Mr. John F u s t
A final point about justification: the formatter adjusts a line by widening the blank
space between words. If you do not want the space between two words adjusted or split
across output lines, precede the space with a backslash. This is called an unpaddable
space.
There are many obscure applications for unpaddable spaces; we wil1 mention
them as appropriate. Here’s a simple one that may come in handy: n r o f f and
t r o f f normally add two blank spaces after a period, question mark, or exclamation
point. The formatter can’t distinguish between the end of a sentence and an abbrevia-
tion, so if you find the extra spacing unaesthetic, you might follow an abbreviation like
Mr.with an unpaddable space: M r . \ J o h n F u s t .
Hyphenation
As pointed out previously, hyphenation is closely related to filling and justification, in
that it gives n r o f f and t r o f f some additional power to produce filled and justified
lines without large gaps.
The n r o f f and t K O f f programs perform hyphenation according to a general
set of rules. Occasionally, you need to control the hyphenation of particular words.
You can specify either that a word not be hyphenated or that it be hyphenated in a cer-
tain way. You can also turn hyphenation off entirely.
.
To use hw, simply specify the word or words that constitute the exception list,
typing a hyphen at the point or points in the word where you would like it to be
hyphenated:
.hw hy-phen-a-tion
You can specify multiple words with one - hw request, or you can issue multiple .hw
requests as you need them.
However, if it is just a matter of making sure that a particular instance of a word
is hyphenated the way you want, you can use the hyphenation indication character
sequence \ % . As you type the word in your text, simply type the two characters \ % at
each acceptable hyphenation point, or at the front of the word if you don’t want the
word to be hyphenated at all:
\%acknowledge the word acknowledge will not be hyphenated
ac\%know\%ledge the word acknowledge can be hyphenated only
at the specified points
This character sequence is the first instance we have seen of a formatting request that
does not consist of a request name following a period in column one. We will see
many more of these later. This sequence is embedded right in the text but does not
print out.
In general, nrof f and t r o f f do a reasonable job with hyphenation. You will
need to set specific hyphenation points only in rare instances. In genera1, you shouldn’t
even worry about hyphenation points, unless you notice a bad break. Then use either
.hw or \ % to correct it.
The UNIX hyphen command can be used to print out all of the hyphenation
points in a file formatted with n r o f f or trof f -a.
$ n r o f f optionsfiles I hyphen
or:
$ t r o f f options -a files I hyphen
If your system doesn’t have the hyphen command, you can use grep instead:
$ nroff oprionsjiles I grep ’ -$’
(The single quotation marks are important because they keep g r e p from interpreting
the - as the beginning of an option.)
PageLayout
Apart from the adjusted address block, the biggest formatting drawback that you prob-
ably noticed when we formatted the sample letter is that there was no left or top margin.
Furthermore, though it is not apparent from our one-page example, there is no bottom
margin either. If there were enough text in the input file to run onto a second page, you
would see that the text ran continuously across the page boundary.
In normal use, these layout problems would be handled automatically by either
the m s or mm macro packages (described later). Here, though, we want to understand
how the formatter itself works.
Let’s continue our investigation of the n r o f f and t r o f f markup language
with some basic page layout commands. These commands allow you to affect the
placement of text on the page. Some of them (those whose descriptions begin with the
word s e t ) specify conditions that will remain in effect until they are explicitly changed
by another instance of the same request. Others have a one-time effect.
As shown in Table 4-2, there are two groups of page layout commands, those that
affect horizontal placement of text on the page and those that affect vertical placement.
A moment’s glance at these requests wiIl tell you that, before anything else, we need to
talk about units.
Units of Measure
By default, most n r o f f and t r o f f commands that measure vertical distance (such
as sp) do so in terms of a number of ‘‘lines’’ (also referred to as vertical spaces, or
vs). The n r o f f program has constant, device-dependent line spacing; t r o f f has
variable line spacing, which is generally proportional to the point size. However, both
programs do allow you to use a variety of other units as well. You can specify spacing
in terms of inches and centimeters, as well as the standard printer’s measures picas and
points. (A pica is 1/6 of an inch; a point is about 1/72 of an inch. These units were
originally developed to measure the size of type, and the relationship between these two
units is not as arbitrary as it might seem. A standard 12-point type is 1 pica high.)
Horizontal measures, such as the depth of an indent, can also be specified using
any of these measures, as well as the printer’s measures ems and ens. These are relative
measures, originally based on the size of the letters m and n in the current type size and
typeface. By default, horizontal measures are always taken to be in ems.
There is also a relationship between these units and points and picas. An em is
always equivalent in width to the height of the character specified by the point size. In
other words, an em in a 12-point type is 12 points wide. An en is always half the size
of an em, or half of the current point size. The advantage of using these units is that
they are relative to the size of the type being used. This is unimportant in n r o f f,
but using these units in t r o f f gives increased flexiblility to change the appearance of
the document without recoding.
The n r o f f and t r o f f programs measure not in any of these units, but in
device-dependent basic units. Any measures you specify are converted to basic units
before they are used. Typically, n r o f f measures in horizontal units of 1/240 of an
inch and o t r o f f uses a unit of 1/432 inch. These units too are not as arbitrary as
they may seem. According to Joseph Osanna’s NrofSITroff User’s Manual-the origi-
nal, dense, and authoritative documentation on t r o f f published by AT&T as part of
the UNlX Programmer’s Manual-the n r o f f units were chosen as “the least com-
mon multiple of the horizontal and vertical resolutions of various typewriter-like output
devices.” The units for o t r o f f were based on the C/A/T typesetter (the device for
which t r o f f was originally designed), which could move in horizontal increments of
1/432 of an inch and in vertical increments of exactly one-third that, or 1/144 inch.
Units for d i t r o f f depend on the resolution of the output device. For example, units
for a 300 dot-per-inch (dpi) laser printer will be 1/300 of an inch in either a vertical or a
horizontal direction. See Appendix D for more information on d i t r o f f device
units.
You don’t need to remember the details of all these measures now. You can gen-
erally use the units that are most familiar to you, and we’ll come back to the others
when we need them.
To specify units, you simply need to add the appropriate scale indicator from
Table 4-3 to the numeric value you supply to a formatting request. For example, to
space down 3 inches rather than 3 lines, enter the request:
.sp 3i
The numeric part of any scale indicator can include decimal fractions. Before the speci-
fied value is used, nro f f and t r o f f will round the value to the nearest number of
device units.
nroff and t r o f f 77
Indicator Units
C Centimeters
i Inches
m Ems
n Ens
P Points
P Picas
U Device Units
V Vertical spaces (lines)
none Default
In fact, you can use any reasonable numeric expression with any request that
expects a numeric argument. However, when using arithmetic expressions, you have to
be careful about what units you specify. All of the horizontally oriented requests-
.
.11, i n , .ti, .t a, .PO, It, and . .
mc-assume you mean ems unless you
specify otherwise.
Vertically oriented requests like .sp assume v’s unless otherwise specified.
The only exceptions to this rule are - p s and .vs, which assume points by default-
but these are not really motion requests anyway.
As a result, i f you make a request like:
- 1 1 7i/2
The request:
- 1 1 7i/2i
is not what you want either. In performing arithmetic, as with fractions, the formatter
converts scaled values to device units. In o t r o f f , this means the previous expres-
sion is really evaluated as:
-11 (7*432u)/ (2*432u)
I f you really want half o f 7 inches, you should specify the expression like this:
-11 7i/2u
You could easily divide 7 by 2 yourself and simply specify 3 . 5 . The point of this
example is that when you are doing arithmetic-usually with values stored in variables
called number registers (more on these later)-you will need to pay attention to the
interaction between units. Furthermore, because fractional device units are always
rounded down, you should avoid expressions like 7 i / 2 .5u because this is equivalent
to 7i/2u.
78 0 UNlX Text Processing 0
will subtract '/2 inch from the current line length, whatever it is.
Setting Margins
In n r o f f and t r o f f, margins are set by the combination of the .PO (page ofSset)
and - 1 1 (line length) requests. The .PO request defines the left margin. The .11
request defines how long each line will be after filling, and so implicitly defines the
right margin:
right
PO 11
margin
The n r o f f program's default line length of 6.5 inches i s fairly standard for an 8[/2-
by-1 1 page-it allows for l-inch margins on either side.
Assuming that we'd like 11/4-inch margins on either side of the page, we would
issue the following requests:
-11 6 i
. P O 1.25i
This will give us 1 1 / 4 inches for both the right and left margins. The - P O request
specifies a left margin, or page offset, of 11/4 inches. When the 6-inch line length is
added to this, it will leave a similar margin on the right side of the page.
Let's take a look at how our sample letter will format now. One paragraph of the
output should give you the idea.
I In
discussed
our
a
conversation
documentation
last
project
Thursday,
that w o u
we
ld
p r o d u c e a user's g u i d e and r e f e r e n c e manua for
the Alcuin product. Yesterday, I received t h e
p r o d u c t demo a n d other m a t e r i a l s t h a t y o u s e n t m e .
As we saw earlier, n r o f f assumes a default page offset of 0. Either you or the macro
package you are using must set the page offset. In t r o f f, though, there is a default
page offset of 26/27 inch, so you can get away without setting this value.
(Keep in mind that all n r o f f output examples are actually simulated with
t r o f f , and are reduced to fit on our own 5-inch wide printed page. As a result, the
widths shown in our example output are not exact, but are suggestive of what the actual
result would be on an S1/2-by-l1 inch page.)
0 nroffandtroff 0 79
Setting Indents
In addition to the basic page offset, or left margin, you may want to set an indent, either
for a single line or an entire block of text. You may also want to center one or more
lines of text.
To do a single-line indent, as is commonly used to introduce a paragraph, use the
. t i (temporary indent) request. For example, if you followed the blank lines between
paragraphs in the sample letter with the request . t i 5, you’d get a result like this
from n r o f f :
7
...Yesterday,I received the product demo and other
materials that you sent me.
. . .Yesterday,
I received the product demo and other
materials that you sent me.
.in 4
.ti -4
1. Going through a demo session gave me a much better
understanding of the product. I confess to being amazed by
Alcuin - - -
80 0 UNlX Text Processing 0
The first line will start at the margin, and subsequent lines will be indented:
I
...Yesterday, I received t h e product demo and other
materials that you sent me. After studying them,
I want t o clarify a couple of points:
Centering takes into account any indents that are in effect. That is, if you have used
. i n to specify an indent of 1 inch, and the line length is 5 inches, text will be centered
within the 4-inch span following the indent.
To center multiple lines, specify a number as an argument to the request:
.ce 3
Documentation for t h e Alcuin Product
A Proposal Prepared by
Fred Caslon
A Proposal Prepared by
Fred Caslon
Notice that .ce centered all three text lines, ignoring the blank line between.
To center an indeterminately large number of lines, specify a very large number
with the .ce request, then turn it off by entering .ce 0:
.ce 1 0 0 0
Many lines of text here.
.ce 0
In looking at the examples, you probably noticed that centering automatically dis-
ables filling and justification. Each line is centered individually. However, there is also
the case in which you would like to center an entire filled and justified paragraph.
(This paragraph style is often used to set off quoted material in a book or paper.) You
can do this by using both the - i n and - 1 1 requests:
I was particularly interested by one comment that I
read in your company literature:
.in +5n
- 1 1 -5n
The development of Alcuin can be traced back to our
founder’s early interest in medieval manuscripts.
He spent several years in the seminary before
becoming interested in computers. After he became
an expert on typesetting software, he resolved to
put his two interests together.
.in -5n
. 1 1 +5n
Remember that a line centered with .ce takes into account any indents in effect at the
time. You can visualize the relationship between page offset, line length, indents, and
centering as follows:
in I ce
Setting Tabs
No discussion of how to align text would be complete without a discussion of tabs. A
tab, as anyone who has used a typewriter well knows, is a horizontal motion to a prede-
fined position on the line.
The problem with using tabs in n r o f f and t r o f f is that what you see on the
screen is very different from what you get on the page. Unlike a typewriter or a
wysiwyg word processor, the editor/formatter combination presents you with two dif-
ferent tab settings. You can set tabs in v i , and you can set them in n r o f f and
t r o f f, but the settings are likely to be different, and the results on the screen defin-
itely unaesthetic.
However, after you get used to the fact that tabs will not line up on the screen in
the same way as they will on the printed page, you can use tabs quite effectively.
By default, tab stops are set every .8 inches in n r o f f and every .5 inches in
t r o f f . To set your own tab stops in n r o f f or t r o f f , use the . t a request. For
example:
.ta li 2 . 5 i 3i
will set three tab stops, at 1 inch, 2'/2 inches, and 3 inches, respectively. Any previous
or default settings are now no longer in effect.
You can also set incremental tab stops. The request:
.ta li +1.5i +.5i
will set tabs at the same positions as the previous example. Values preceded with a
plus sign are added to the value of the last tab stop.
You can also specify the alignment of text at a tab stop. Settings made with a
numeric value alone are left adjusted, just as they are on a typewriter. However, by
adding either the letter R or C to the definition of a tab stop, you can make text right
adjusted or centered on the stop.
For example, the following input lines (where a tab character is shown by the
symbol :1)-
.n f
.ta l i 2 . 5 i 3.5i
I 1 First I I Second I I Third
.fi
will produce:
Right-adjusted tabs can be useful for aligning numeric data. This is especially
true in t r o f f, where all characters (including blank spaces) have different sizes, and,
as a result, you can’t just line things up by eye. If the numbers you want to align have
an uneven number of decimal positions, you can manually force right adjustment of
numeric data using the special escape sequence \ 0 , which will produce a blank space
exactly the same width as a digit. For example:
. t a liR
I I500.2\0
I -
I 125 3 5
I 150. \ O \ O
will produce:
As on a typewriter, if you have already spaced past a tab position (either by print-
ing characters, or with an indent or other horizontal motion), a tab in the input will push
text over to the next available tab stop. If you have passed the last tab stop, any tabs
present in the input will be ignored.
You must be in no-fill mode for tabs to work correctly. This is not just because
filling will override the effect of the tabs. Using .n f when specifying tabs is an
important rule of thumb; we’ll look at the reasoning behind it in Chapter 15.
Underlining
We haven’t yet described how to underline text, a primary type of emphasis in
n r o f f, which lacks the trof f ability to switch fonts for emphasis.
There are two underlining requests: .u l (underfine) and .c u (continuous
underline). The .u l request underlines only printable characters (the words, but not
the spaces), and .c u underlines the entire text string.
84 0 UNlX Text Processing 0
.
These requests are used just like ce. Without an argument, they underline the
text on the following input line. You can use a numeric argument to specify that more
than one line should be underlined.
Both of these requests produce italics instead of underlines in t r o f f . Although
there is a request, .u f , that allows you to reset the underline font to some other font
than italics,* there is no way to have these requests produce underlining even in
t rof f . (The m s and mm macro packages both include a mucro to do underlining in
t ro f f , but this uses an entirely different mechanism, which is not explained until
Chapter 15.)
I 1
In our conversation last Thursday, we discussed a documentation project that would
produce a user’s guide and reference manual for the Alcuin product. Yesterday, I
received the product demo and other materials that you sent me.
Going through a demo session gave me a better understanding of the product. I con-
fess to being amazed by Alcuin. Some people around here, looking over my
shoulder, were also astounded by the illuminated manuscript I produced with Alcuin.
One person, a student of calligraphy, was really impressed.
The output would probably look better if there was a smaller amount of space between
the lines. If we replace the line between the paragraphs with the request - sp - 5 ,
here is what we will get:
*This request is generally used when the document is being typeset in a font family other than Times
Roman. It might be used to set the “underline font” to Helvetica Italic, rather than the standard Italic.
nroff and troff a5
Going through a demo session gave me a much better understanding of the product.
I confess to being amazed by Alcuin. Some people around here, looking over my
shoulder, were also astounded by the illuminated manuscript I produced with Alcuin.
One person, a student of calligraphy, was really impressed.
Although it may not yet be apparent how this will be useful, you can also space to an
absolute position on the page, by inserting a vertical bar before the distance. The fol-
lowing:
- s p 13i
will space down to a position 3 inches from the top of the page, rather than 3 inches
from the current position.
You can also use negative values with ordinary relative spacing requests. For
example:
.sp -3
will move back up the page three lines. Of course, when you use any of these requests,
you have to know what you are doing. I f you tell n r o f f or t r o f f to put one line
on top of another, that’s exactly what you’ll get. For example:
This is t h e first line.
.sp -2
This is the second line.
.
br
This is t h e third line.
I I
I This is t h e second line.
This i s t h e flhrsd line.
Sure enough, the second line is printed above the first, but because we haven’t restored
the original position, the third line is then printed on top of the first.
When you make negative vertical motions, you should always make compensatory
positive motions, so that you end up at the correct position for future output. The previ-
ous example would have avoided disaster if it had been coded:
This is t h e first line.
.sp -2
This is t h e second line.
-
SP
This is t h e third line.
86 0 UNlX Text Processing 0
(Notice that you need to space down one less line than you have spaced up because, in
this case, printing the second line ?uses up? one of the spaces you went back on.)
These kind of vertical motions are generally used for line drawing (e-g., for draw-
ing boxes around tables), in which all of the text is output, and the fonnatter then goes
back up the page to draw in the lines. At this stage, it is unlikely that you will find an
immediate use for this capability. Nonetheless, we are sure that a creative person,
knowing that it is there, will find it just the right tool for a job. (We?ll show a few
creative uses of our own later.)
You probably aren?t surprised that a typesetter can go back up the page. But you
may wonder how a typewriter-like printer can go back up the page like this. The
answer is that it can?t. If you do any reverse line motions (and you do when you use
certain macros in the standard packages, or the t b l and e q n preprocessors), you
must pass the n r o f f output through a special filter program called col to get all of
the motions sorted out beforehand, so that the page will be printed in the desired order:
$ nroff files I col I lp
Page Transitions
If we want space at the top of our one-page letter, it is easy enough to insert the com-
mand:
- s p li
before the first line of the text. However, n r o f f and troff do not provide an
easy way of handling page transitions in multipage documents.
By default, n r o f f and t r o f f assume that the page length is 1 1 inches. How-
ever, neither program makes immediate use of this information. There is no default top
and bottom margin, so text output begins on the first line, and goes to the end of the
page.
The .b p (break page) request allows you to force a page break. If you do this,
the remainder of the current page will be filled with blank lines, and output will start
again at the top of the second page. If you care to test this, insert a .bp anywhere in
the text of our sample letter, then process the letter with n r o f f . If you save the
resulting output in a file:
$ nroff letter > 1etter.out
0 nroff and t r o f f 0 87
you will find that the text following the .bp begins on line 67 ( 1 1 inches at 6 lines per
inch equals 66 lines per page).
To automatically leave space at the top and bottom of each page, you need to use
the .wh (when) request. In nrof f and t r o f f parlance, this request sets a trap-a
position on the page at which a given macro will be executed.
You’ll notice that we said mucro, not request. There’s the rub. To use .wh,
you need to know how to define a macro. It doesn’t work with single requests.
There’s not all that much to defining macros, though. A macro is simply a
sequence of stored requests that can be executed all at once with a single command.
We’ll come back to this later, after we’ve looked at the process of macro definition.
For the moment, let’s assume that we’ve defined two macros, one containing the
commands that will handle the top margin, and another for the bottom margin. The
first macro will be called .TM, and the second .BM. (By convention, macros are
often given names consisting of uppercase letters, to distinguish them from the basic
n r o f f and t r o f f requests. However, this is a convention only, and one that is not
always followed.)
To set traps that will execute these macros, we would use the .wh request as fol-
lows:
.wh 0 TM
.wh -li BM
The first argument to .wh specifies the vertical position on the page at which to exe-
cute the macro. An argument of 0 always stands for the top of the page, and a nega-
tive value is always counted from the bottom of the page, as defined by the page length.
In its simplest form, the .TM macro need only contain the single request to space
down 1 inch, and - BM need only contain the single request to break to a new page. If
.wh allowed you to specify a single request rather than a macro, this would be
equivalent to:
.wh 0 .sp l i
.wh -1i .bp
With an 1 1-inch page length, this would result in an effective 9-inch text area, because
on every page, the formatter’s first act would be to space down 1 inch, and it would
break to a new page when it reached 1 inch from the bottom.
You might wonder why n r o f f and t r o f f have made the business of page
transition more complicated than any of the other essential page layout tasks. There are
two reasons:
switch to the style used by the header or footer, and then revert to the original
style when it returns to the main text. Or consider the matter of footnotes-the
position at which the page ends is different when a footnote is on the page.
The page transition trap must make some allowance for this.
In short, what you might like the formatter to do during page transitions can vary. For
this reason, the developers of n r o f f and t r o f f have allowed users to define their
own macros for handling this area.
When you start out with n r o f f or t r o f f , we advise you to use one of the
ready-made macro packages, m s or mm. The standard macro package for UNIX sys-
tems based on System V is mm; the standard on Berkeley UNIX systems is m s .
Berkeley UNIX systems also support a third macro package called m e . In addition,
there are specialized macro packages for formatting viewgraphs, standard UNIX refer-
ence manual pages (man), and UNIX permuted indexes (mptx). Only the m s and
mm packages are described in this book. The macro packages have already taken into
account many of the complexities in page transition (and other advanced formatting
problems), and provide many capabilities that would take considerable time and effort
to design yourself.
Of course, it is quite possible to design your own macro package, and we will go
into all of the details later. (In fact, this book is coded with neither of the standard
macro packages, but with one developed by Steve Kochan and Pat Wood of Pipeline
Associates, the consulting editors of this series, for use specifically with the Hayden
UNIX library.)
changing the page length to 9.5 inches, and setting I-inch margins at the top
and bottom;
leaving the page length at 1 1 inches, and setting 1.75-inch margins at the top
and bottom.
In general, we prefer to think of .pl as setting the paper length, and use the page
transition traps to set larger or smaller margins.
However, there are cases where you really are working with a different paper size.
A good example of this is printing addresses on envelopes: the physical paper height is
about 4 inches (24 lines on a typewriter-like printer printing 6 lines per inch), and we
0 n r o f f and t r o f f 0 a9
want to print in a narrow window consisting of four or five lines. A good set of defini-
tions for this case would be:
-pl 4i
.wh 0 TM
.wh -9v BM
with .TM containing the request .s p 9v, and with .BM, as before, containing
.bp.
There is more to say about traps, but it will make more sense later, so we’ll leave
the subject for now.
Fortunately, there is a way around this problem. If you begin a request with an apos-
trophe instead of a period, the request will not cause a break.
90 0 UNlX Text Processing 0
(In fact, most page transition macros use this feature to make paragraphs continue
across page boundaries. We’ll take a closer look at this in later chapters.)
Another very useful request is the conditional page break, or .ne (need) request.
If you want to make sure an entire block of text appears on the same page, you can use
this request to force a page break if there isn’t enough space left. If there is sufficient
space, the request is ignored.
For example, the two requests:
.ne 3.2i
. s p 3i
might be used to reserve blank space to paste in an illustration that is 3 inches high.
The .n e request does not cause a break, so you should be sure to precede it with
. b r or another request that causes a break if you don’t want the remnants of the
current line buffer carried to the next page if the .ne is triggered.
It is often better to use .ne instead of .bp, unless you’re absolutely sure that
you will always want a page break at a particular point. If, in the course of editing, an
.ne request moves away from the bottom of the page, it will have no effect. But a
.bp will always start a new page, sometimes leaving a page nearly blank when the text
in a file has been changed significantly.
There are other special spacing requests that can be used for this purpose.
(Depending on the macro package, these may have to be used.) For example, .s v
(save space) requests a block of contiguous space. If the remainder of the page does
not contain the requested amount of space, no space is output. Instead, the amount of
space requested is remembered and is output when an .os (output saved space)
request is encountered.
These are advanced requests, but you may need to know about them because most
macro packages include two other spacing requests in their page transition macros:
.ns (no space) and .rs (restore space). An .n s inhibits the effect of spacing
requests; .r s restores the effectiveness of such requests.
0 n r o f f and troff 0 91
Both the m s and mm macros include an .ns request in their page transition
macros. As a result, if you issue a request like:
.sp 3i
with 1 inch remaining before the bottom of the page, you will not get 1 inch at the bot-
tom, plus 2 inches at the top of the next page, but only whatever remains at the bottom.
The next page will start right at the top. However, both macro packages also include an
- o s request in their page top macro, so if you truly want 3 inches, use .s v 3 i,and
you will get the expected result.
However, if you use .s v , you will also have another unexpected result: text
following the spacing request will “float” ahead of it to fill up the remainder of the
current page.
We’ll talk more about this later. We introduced it now to prevent confusion when
spacing requests don’t always act the way you expect.
Page Numbering
The nrof f and t rof f programs keep track of page numbers and make the current
page number available to be printed out (usually by a page transition macro). You can
.
artificially set the page number with the pn request:
You can also artificially set the number for the nexf page whenever you issue a .bp
request, simply by adding a numeric argument:
.bp 5 Break the page and set the next page number to 5
.bp + 5 Break the page and increment the next page number by 5
.bp -5 Break the page and decrement the next page number by 5
The starting page number (usually 1) can also be set from the command line, using the
-n option. For example:
92 0 UNlX Text Processing 0
will start numbering file at page number 10. In addition, there is a command-line
option to print only selected pages of the output. The -0 option takes a list of page
numbers as its argument. The entire file (up to the last page number in the list) is pro-
cessed, but only the specified pages are output. The list can include single pages
separated by commas, or a range of pages separated by a hyphen, or both. A number
followed by a trailing hyphen means to output from that page to the end. For example:
$ n r o f f -ms -01,5,7-9,13- file
will output pages 1, 5, 7 through 9, and from 13 to the end o f the file. There should be
no spaces anywhere in the list.
= Changing Fonts
In old t rof f (otrof f), you were limited to four fonts at a time, because the fonts
had to be physically mounted on the C/A/T typesetter. With ditrof f and a laser
printer or a modem typesetter, you can use a virtually unlimited number of fonts in the
same document.
In o t rof f you needed to specify the basic fonts that are in use with the - fp
(font position) request. Normally, at the front o f a file (or, more likely, in the macro
package), you would use this request to specify which fonts are mounted in each of the
four quadrants (positions) of the typesetter wheel. B y default, the roman font is
mounted in position 1 , the italic font in position 2, the bold font in position 3, and the
special font in position 4. That is, t rof f acts as though you had included the lines:
-fp 1 R
-fp 2 I
.fp 3 B
.fp 4 s
In dit rof f, up to ten fonts are automatically mounted, with the special font in posi-
tion 10. Which fonts are mounted, and in which positions, depends on the output dev-
ice. See Appendix D for details. The font that is mounted in position 1 will be used
for the body type of the text-it is the font that will be used if no other specification is
given. The special font is also used without any intervention on your part when a char-
acter not in the normal character set is requested.
To request one of the other fonts, you can use either the .ft request, or the
inline font-switch escape sequence \ f.
For example:
.ft €3
will produce:
0 n r o f f and t r o f f 0 93
I I
I This line will be set in bold type.
This line will again be set in roman type.
You can also change fonts using an inline font escape sequence. For example, the
preceding sentence was coded like this:
...a n i n l i n e f o n t \fIescape sequence\fP.
You may wonder at the \ fP at the end, rather than \ fR. The P command is a spe-
cial code that can be used with either the .ft request or the \ f escape sequence. It
means “return to the previous font, whatever it was.” This is often preferable to an
explicit font request, because it i s more general.
All of this begs the question of fonts different than Times Roman, Bold, and
Italic. There are two issues: first, which fonts are available on the output device, and
second, which fonts does t roff have width tables for. (As described previously,
troff uses these tables to determine how far to space over after it outputs each char-
acter.) For otroff these width tables are in the directory /usr/lib/font, in
files whose names begin with ft. If you list the contents of this directory, you might
see something like this for o t KO ff:
$ 1s /usr/lib/font
ftB ftBC ftC ftCE ftCI
ftCK ftCS ftCW ftFD ftG
ftGI ftGM ftGR ftH ftHB
ftHI ftI ftL ftLI ftPA
ftPB f t P I ftR f t S ftSB
ftSI ftSM ftUD
You can pick out the familiar R, I, B, and S fonts, and may guess that ftH, ftHI,
and ftHB refer to Helvetica, Helvetica Italic, and Helvetica Bold fonts. However,
unless you are familiar with typesetting, the other names might as well be Greek to you.
In any event, these width tables, normally supplied with troff,are for fonts that are
commonly used with the C/A/T typesetter. If you are using a different device, they may
be of no use to you.
The point is that if you are using a different typesetting device, you will need to
get information about the font names for your system from whoever set up the equip-
ment to work with troff. The contents of /usr/lib/font will vary from
installation to installation, depending on what fonts are supported.
For ditroff,there is a separate subdirectory in /usr/lib/font for each
supported output device. For example:
$ 1s /usr/lib/font
devl j devps
$ 1s /usr/lib/font/devps
B.out BI.out CB.out CI.out CW.out CX.out
DESC.out H . o u t HB o u t . HI. out HK. o u t HO. o u t
HX.out I.out L I out . PA. o u t .
PB o u t PI. out
PX-out R.out 0. o u t RS.out S.out s1. o u t
94 0 UNlX Text Processing 0
Helvetica
abcdefghijklrnnopqrstuvwxyz
ABCDEFGHIJKLMNOPQRSTUVWXYZ
1234567890
!$%&a()"*+-.,/:;=?[]I
0 - - - l/4 ' / z 3/4 fi fl "t'80
Helvetica Italic
abcdefghijklrnnopqrstuvwxyz
ABCDEFGHIJKLMNOPQRS TUVWXYZ
1234567890
!$ % & () " + - . ,/: ;= ? [ ] I
0 -- - '/4 1' 3/4
'2 fi fl 't '80
Helvetica Bold
abcdefghijklmnopqrstuvwxyz
ABCDEFGHIJKLMNOPQRSTUVWXYZ
1234567890
!$%&()"*+-.,/:;=?[]I
0 -- - l/4 112 314 fi fl " t * @ @
When specifying two-character font names with the \f escape sequence, you
must add the ( prefix as well. For example, you would specify Helvetica Italic by the
inline sequence \ f ( H I , and Helvetica Bold by \ f (HB.
0 n r o f f and t r o f f 0 95
There is another issue when you are using fonts other than the Times Roman fam-
ily. Assume that you decide to typeset your document in Helvetica rather than Roman.
You reset your initial font position settings to read:
-fp 1 H
.fp 2 H I
. f p 3 HB
.fp 4 s
or:
\fB
You will need to make a set of global replacements throughout your file. To insulate
yourself in a broader way from overall font change decisions, t rof f allows you to
specify fonts by position, even within .f t and \f requests:
Because you don’t need to use the .f p request to set font positions with d i t K O f f,
and the range of fonts is much greater, you may have a problem knowing which fonts
are mounted in which positions. A quick way to find out which fonts are mounted is to
run d i t r o f f on a short file, sending the output to the screen. For example:
$ ditroff -Tps junk 1 more
x T ps
x res 7 2 0 1 1
x init
x font 1 R
x font 2 I
x font 3 B
x font 4 BI
x font 5 CW
x font 6 CB
x font 7 H
x font 8 HB
x font 9 HI
x font 10 S
.-.
The font positions should appear at the top of the file. In this example, you see the fol-
lowing fonts: (Times) Roman, (Times) Bold, (Times) Italic, (Times) Bold Italic, Con-
stant Width, Constant Bold, Helvetica, Helvetica Bold, Helvetica Italic, and Special.
Which font is mounted in which position is controlled by the file DESC. o u t in the
device subdirectory of / u s r / l i b / f o n t . See Appendix D for details.
I
96 0 UNlX Text Processing 0
Special Characters
A variety of special characters that are not part of the standard ASCII character set are
supported by n r o f f and t r o f f . These include Greek letters, mathematical sym-
bols, and graphic characters. Some of these characters are part of the font referred to
earlier as the special font. Others are part of the standard typesetter fonts.
Regardless of the font in which they are contained, special characters are included
in a file by means of special four-character escape sequences beginning with \ (.
Appendix B gives a complete list of special characters. However, some of the
most useful are listed in Table 4-4, because even as a beginner you may want to include
them in your text. Although n r o f f makes a valiant effort to produce some of these
characters, they are really best suited for t r o f f .
square \(sq 0
baseline rule \(nJ -
underrule \(ul -
114 \( 14 ’14
112 \( 12 ‘12
314 \( 34 v4
degrees \(de 0
dagger \(dg t
double dagger \(dd $
registered mark kg 8
copyright symbol \(co 0
section mark \( sc 0
square root \(sq .I
greater than or equal \(>= 2
less than or equal \( <= I
not equal \(!= #
multiply \(mu X
divide \(di -
plus or minus \(+- k
right arrow \( -> +
left arrow \(<- t
up arrow \( ua T
down arrow \(da J8
We’ll talk more about some of these special characters as we use them. Some are
used internally by eqn for producing mathematical equations. The use of symbols
such as the copyright, registered trademark, and dagger is fairly obvious.
0 n r o f f and t r o f f 0 97
However, you shouldn’t limit yourself to the obvious. Many of these special
characters can be put to innovative use. For example, the square root symbol can be
used to simulate a check mark, and the square can become an alternate type of bullet.
As we’ll show in Chapter 15, you can create additional, effective character combina-
tions, such as a checkmark in a box, with overstriking.
The point i s to add these symbols to your repertoire, where they can wait until
need and imagination provide a use for them.
A - p s request that does not specify any point size reverts to the previous point size
setting, whatever it was:
.ps 1 0
To switch point size in the middle of the line, use the \ s escape sequence. For exam-
ple, many books reduce the point size when they print the word UNIX in the middle of a
line. The preceding sentence was produced by these input lines:
98 0 UNlX Text Processing 0
As you can probably guess from the example, \ S O does not mean to use a point size
of 0, but to revert to the previous size.
In addition, you can use relative values when specifying point sizes. Knowing
that the body of the book is set in 10-point type, we could have achieved the same
result by entering:
For example, many books reduce t h e point size when
they print t h e word \s-2UNIX\sO in t h e middle of a line.
You can increment or decrement point sizes only using a single digit; that is, you can’t
increment or decrement the size by more than 9 points.
Only certain sizes may be available on the typesetter. (Legal point sizes in
o t r o f f are 6, 7 , 8, 9, 10, 1 1 , 12, 14, 16, 18, 20, 22, 24, 28, and 36. Legal point sizes
in d i t r o f f depend upon the output device, but there will generally be more sizes
available.) If you request a point size between two legal sizes, o t r o f f will round up
to the next legal point size; d i t r o f f will round to the nearest available size.
Vertical Spacing
In addition to its ability to change typefaces and type sizes on the same page, a
typesetter allows you to change the amount o f vertical space between lines. This spac-
ing is sometimes referred to as the baseline spacing because it is the distance between
the base o f characters on successive lines. (The difference between the point size and
the baseline spacing is referred to as leading, from the old days when a human compo-
sitor inserted thin strips of lead between successive lines of type.)
A typewriter or typewriter-style printer usually spaces vertically in 1/6-inch incre-
ments (Le.. 6 lines per inch). A typesetter usually adjusts the space according to the
point size. For example, the type samples shown previously were all set with 20 points
of vertical space. More typically, the vertical space will vary along with the type size,
like this:
will space down 6 points, or half the current vertical line spacing. However, if you
change the baseline vertical spacing to 16, the .s p request will space down 16 points.
Spacing specified in any other units will be unaffected. What all this adds up to is the
commonsense observation that a blank line takes up the same amount of space as one
containing text.
When you use double and triple spacing, it applies a multiplication factor to the
baseline spacing. The request - 1 s 2 will double the baseline spacing. You can
specify any multiplication factor you like, though 2 and 3 are the most reasonable
values.
The - 1 s request will only affect the spacing between output lines of text. It
does not change the definition of v or affect vertical spacing requests.
Although we won’t go into all the details of macro design until we have discussed the
existing macro packages in the next two chapters, we’ll cover some of the basic con-
cepts here. This will help you understand what the macro packages are doing and how
they work.
To define a macro, you use the . d e request, followed by the sequence of
requests that you want to execute when the macro is invoked. The macro definition is
terminated by the request . . (two dots). The name to be assigned to the macro is
given as an argument to the .de request.
You should consider defining a macro whenever you find yourself issuing a
repetitive sequence of requests. If you are not using one of the existing macro packages
(which have already taken care of this kind of thing), paragraphing is a good example of
the kind of formatting that lends itself to macros.
Although it is certainly adequate to separate paragraphs simply by a blank line,
you might instead want to separate them with a blank line and a temporary indent.
What’s more, to prevent “orphaned” lines, you would like to be sure that at least two
lines of each paragraph appear at the bottom of the page. So you might define the fol-
lowing macro:
100 0 UNlX Text Processing 0
.de P
- SP
.ne 2
.ti 5n
Macro Arguments
Most basic t r o f f requests take simple arguments-single characters or letters. Many
macros take more complex arguments, such as character strings. There are a few simple
pointers you need to keep in mind through the discussion of macro packages in the next
two chapters.
First, a space is taken by default as the separator between arguments. If a single
macro argument is a string that contains spaces, you need to quote the entire string to
keep it from being treated as a series of separate arguments.
For example, imagine a macro to print the title of a chapter in this book. The
macro call looks like this:
.CH 4 "Nroff and Troff"
A second point: to skip an argument that you want to ignore, supply a null string ("").
For example:
.CH " " '' P re face"
As you can see, it does no harm to quote a string argument that doesn't contain spaces
( " P r e f a c e " ) , and it is probably a good habit to quote all strings.
Number Registers
When you use a specific value in a macro definition, you are limited to that value when
you use the macro. For example, in the paragraph macro definition shown previously,
the space will always be 1, and the indent always 5n.
However, n r o f f and t r o f f allow you to save numeric values in special vari-
ables known as number registers. If you use the value of a register in a macro defini-
tion, the action of the macro can be changed just by placing a new value in the register.
For example, in m s , the size of the top and bottom margins i s not specified with an
absolute value, but with a number register. As a result, you don't need to change the
macro definition to change these margins; you simply reset the value of the appropriate
number register. Just as importantly, the contents of number registers can be used as
flugs (a kind of message between macros). There are conditional statements in the
markup language of n r o f f and t r o f f , so that a macro can say: "If number register
0 nroff and troff 0 101
Y has the value x, then do thus-and-so. Otherwise, do this.? For example, in the mm
macros, hyphenation is turned off by default. To turn it on, you set the value of a cer-
tain number register to 1. Various macros test the value of this register, and use it as a
signal to re-enable hyphenation.
To store a value into a number register, use the .n r request. This request takes
two arguments: the name of a number register,* and the value to be placed into it.
For example, in the m s macros, the size of the top and bottom margins is stored
in the registers HM (header margin) and F M (footer margin). To reset these margins
from their default value of 1 inch to 1.75 inches (thus producing a shorter page like the
one used in this book), all you would need to do is to issue the requests:
.nr HM 1 . 7 5 i
.nr F M 1 . 7 5 i
You can also set number registers with single-character names from the command line
by using the -r option. (The mm macros make heavy use of this capability.) For
example:
$ ntoff -nrm -rN1 file
will formatfile using the m macros, with number register N set to the value 1 . We
will talk more about using number registers later, when we describe how to write your
own macros. For the moment, all you need to know i s how to put new values into
existing registers. The next two chapters will describe the particular number registers
that you may find useful with the mm and m s macro packages.
Predefined Strings
The m m and m s macro packages also make use of some predefined text strings. The
n r o f f and t r o f f programs allow you to associate a text string with a one- or two-
character string name. When the formatter encounters a special escape sequence includ-
ing the string name, the complete string is substituted in the output.
To define a string, use the . d s request. This request takes two arguments, the
string name and the string itself. For example:
. d s nt N r o f f and T r o f f
The string should not be quoted. It can optionally begin with a quotation mark, but it
should not end with one, or the concluding quotation mark will appear in the output. If
you want to start a string with one or more blank spaces, though, you should begin the
definition with a quotation mark. Even in this case, there is no concluding quotation
mark. A s always, the string is terminated by a newline.
*Number register names can consist of either one or two characters, just like macro names. However, they
are distinct-that is, a number register and a macro can be given the same name without conflict.
102 0 UNIX Text Processing 0
You can define a multiline string by hiding the newlines with a backslash. For
example:
.ds LS This is a very long string that goes over \
more than one line.
When the string is interpolated, it will be subject to filling (unless no-fill mode is in
effect) and may not be broken into lines at the same points as you’ve specified in the
definition. To interpolate the string in the output, you use one of the following escape
sequences:
\*a
\ * (ab
where a is a one-character string name, and ab is a two-character string name.
To use the nr string we defined earlier, you would type:
\ * (nt
It would be replaced in the output by the words Nroff and Troff.
Strings use the same pool of names as macros. Defining a string with the same
name as an existing macro will make the macro inoperable, so it is not advisable to go
around wildly defining shorthand strings. The v i editor’s abbreviation facility
(described in Chapter 7) i s a more effective way to save yourself work typing.
Strings are useful in macro design in much the same way number registers are-
they allow a macro to be defined in a more general way. For example, consider this
book, which prints the title of the chapter in the header on each odd-numbered page.
The chapter title is not coded into the page top macro. Instead, a predefined string is
interpolated there. The same macro that describes the format of the chapter title on the
first page of the chapter also defines the string that will appear in the header.
In using each of the existing macro packages, you may be asked to define or
interpolate the contents of an existing string. For the most part, though, string defini-
tions are hidden inside macro definitions, so you may not run across them. However,
there are a couple of handy predefined strings you may find yourself using, such as:
\ * (DY
which always contains the current date in the m s macro package. (The equivalent
string in mm is \ * (DT.) For example, if you wanted a form letter to contain the date
that it was formatted and printed rather than the date it was written, you could interpo-
late this string.
There is no magic to the options -ms and -mm. The actual option to n r o f f
and troff is -mx, which tells the program to look in the directory
/usr/lib/tmac for a file with a name of the form tmac .x. As you might expect,
this means that there is a file in that directory called tmac .s or tmac .m (depending
on which package you have on your system). It also means that you can invoke a
macro package of your own from the command line simply by storing the macro defini-
tions in a file with the appropriate pathname. This file will be added to any other files
in the formatting run. This means that if you are using the ms macros you could
achieve the same result by including the line:
.so /usr/lib/tmac/tmac.s
at the start of each source file, and omitting the command-line switch -ms. (The .so
request reads another file into the input stream, and when its contents have been
exhausted, returns to the current file. Multiple .s o requests can be nested, not just to
read in macro definitions, but also to read in additional text files.)
The macros in the standard macro packages are no different (other than in com-
plexity) than the macros you might write yourself. In fact, you can print out and study
the contents of the existing macro packages to learn how they work. We’ll be looking
in detail at the actions of the existing macro packages, but for copyright reasons we
can’t actually show their internal design. We’ll come back to all this later. For now,
all you need to know is that macros aren’t magic-just an assemblage of simple com-
mands working together.
C H A P T E R
rn 8 rn
Thems Macros
The UNIX shell i s a user interface for the kernel, the actual heart of the operating sys-
tem. You can choose the C shell or Korn shell instead of the Bourne shell, without
worrying about its effects on the low-level operations of the kernel. Likewise, a macro
package is a user interface for accessing the capabilities of the n r o f f / t r o f f for-
matter. Users can select either the m s or m macro packages (as well as other pack-
ages that are available on some systems) to use with n r o f f/tr o f f .
The m s package was the original Bell Labs macro package, and is available on
many UNIX systems, but it is no longer officially supported by AT&T. Our main rea-
son for giving m s equal time is that many Berkeley UNIX systems ship m s instead of
mm. In addition, it is a less complex package, so it is much easier to learn the principles
of macro design by studying m s than by studying mm.
A third general-purpose package, called m e , is also distributed with Berkeley
UNIX systems. It was written by Eric Allman and is comparable to m s and m.
(Mark Horton writes us: I think of m s as the FORTRAN of n r o f f, mm as the PL/I,
and m e as the Pascal.) The m e package i s not described in this book.
In addition, there are specialized packages-mv, for formatting viewgraphs,
m p t x , for formatting the permuted index found in the UNIX Reference Manual, and
man, for formatting the reference pages in that same manual. These packages are sim-
ple and are covered in the standard UNIX documentation.
Regardless of which macro package you choose, the formatter knows only to
replace each call of a macro with its definition. The macro definition contains the set of
requests that the formatter executes. Whether a definition is supplied with the text in
the input file or found in a macro package i s irrelevant to n r o f f l t r o f f. The for-
matter can be said to be oblivious to the idea of a macro package.
You might not expect this rather freely structured arrangement between a macro
package and n r o f f/tr o f f . Macros are application programs of sorts. They organ-
ize the types of functions that you need to be able to do. However, the actual work is
accomplished by n r o f fjtrof f requests.
In other words, the basic formatting capabilities are inherent in n r o f f and
t r o f f ; the user implementation of these capabilities to achieve particular formats is
104
0 The m s Macros 0 105
accomplished with a macro package. If a macro doesn’t work the way you expect, its
definition may have been modified. It doesn’t mean that n r o f f / t r o f f works dif-
ferently on your system. It is one thing to say “nrof f / t r o f f won’t let me do it,”
and another to say “I don’t have the macro to do it (but I could do it, perhaps).”
A general-purpose macro package like m s provides a way of describing the for-
mat of various kinds of documents. Each document presents its own specific problems,
and macros help to provide a simple and flexible solution. The m s macro package is
designed to help you format letters, proposals, memos, technical papers, and reports.
For simple documents such as letters, m s offers few advantages to the basic for-
mat requests described in Chapter 4. But as you begin to format more complex docu-
ments, you will quickly see the advantage of working with a macro package, which pro-
vides specialized tools for so many of the formatting tasks you will encounter.
A text file that contains m s macros can be processed by either n r o f f or
t r o f f , and the output can be displayed on a terminal screen or printed on a line
printer, a laser printer, or a typesetter.
Thus, if you forget to reset the point size or indentation, you might notice that the
problem continues for a while and then stops.
PageLayout
A s suggested in the last chapter, one of the most important functions of a macro pack-
age is that it provides basic page layout defaults. This feature makes it worthwhile to
use a macro package even if you don’t enter a single macro into your source file.
At the beginning of Chapter 4, we showed how n r o f f alone formatted a sample
letter. If we format the same letter with m s , the text will be adjusted on a page that
has a default top and bottom margin of 1 inch, a default left margin, or page offset, of
about 1 inch, and a default line length of 6 inches.
All of these default values are stored in number registers so that you can easily
change them:
LL Line Length
HM Header (top) Margin
FM Footer (bottom) Margin
PO Page offset (left margin)
For example, if you like larger top and bottom margins, all you need to do is
insert the following requests at the top of your file:
.nr HM 1.5i
.nr F M 1.5i
Registers such as these are used internally by a number of m s macros to reset the
formatter to its default state. They will not take effect until one of those “reset” mac-
ros is encountered. In the case of HM and FM, they will not take effect until the next
page unless they are specified at the very beginning of the file.*
Paragraphs
A s we saw in the last chapter, paragraph transitions are natural candidates for macros
because each paragraph generally will require several requests (spacing, indentation,) for
proper formatting.
There are four paragraph macros in m s :
*These “reset” macros (those that call the internal macro .RT) include .LP, .PP, .IP, -QP,
.
.SH, NH, .RS, . R E , .T S , and .TE. The very first met macro calk a special initialization
macro called .B G that is used only once, on the first page. This macro prints the cover sheet, if any (see
“Cover Sheet Macros” later in this chapter), as well as performing some special first-page initialization.
0 The m s Macros 0 107
The LP macro produces a justified, block paragraph. This is the type of para-
graph used for most technical documentation. The . P P macro produces a paragraph
with a temporary indent for the first line. This paragraph type is commonly used in
published books and magazines, as well as in typewritten correspondence.
Let’s use the same letter to illustrate the use of these macros. In the original
example (in Chapter 4), we left blank lines between paragraphs, producing an effect
similar to that produced by the .LP macro.
In contrast, . P P produces a standard indented paragraph. Let’s code the letter
using . P P macros. Because this is a letter, let’s also disable justification with an
.na request. And of course, we want to print the address block in no-fill mode, as
shown in Chapter 4. Figure 5-1 shows the coded letter and Figure 5-2 shows the for-
matted output.
Quoted Paragraphs
A paragraph that is indented equally from the left and right margins is typically used to
display quoted material. It is produced by .QP. For example:
- QP
In t h e next couple of d a y s , I’ll be putting together a .__
108 0 UNlX Text Processing 0
.ad r
April 1, 1987
.sp 2
.ad
.nf
M r . John Fust
Vice President, Research and Development
Gutenberg Galaxy Software
Waltham, Massachusetts 02159
.fi
- SP
.na
Dear M r . Fust:
.PP
In our conversation last Thursday, we discussed a documentation
project that would produce a user's manual on the Alcuin
product. Yesterday, I received the product demo and other
materials that you sent me.
.PP
Going through a demo session gave me a much better understanding
of the product. I confess to being amazed by Alcuin.
Some people around here, looking over my shoulder, were also
astounded by the illustrated manuscript I produced with Alcuin.
One person, a student of calligraphy, was really impressed.
* PP
April 1, 1987
Fred Caslon
The .QP macro produces a paragraph indented on both sides. The pair of macros
.QS and .QE can be used to mark a section longer than one paragraph that is
indented. This is useful in reports and proposals that quote at length from another
source.
- LP
I w a s p a r t i c u l a r l y i n t e r e s t e d i n t h e f o l l o w i n g comment
found i n t h e product s p e c i f i c a t i o n :
- QS
Users f i r s t n e e d a b r i e f i n t r o d u c t i o n t o what
t h e p r o d u c t does. Sometimes t h i s i s m o r e f o r t h e
b e n e f i t o f p e o p l e who h a v e n ' t y e t b o u g h t t h e
p r o d u c t , and a r e j u s t l o o k i n g a t t h e manual.
However, i t a l s o serves t o p u t t h e r e s t o f t h e
manual, a n d t h e p r o d u c t i t s e l f , i n
t h e proper c o n t e x t .
- QE
The result of formatting is:
I was p a r t i c u l a r l y i n t e r e s t e d i n t h e f o l l o w i n g comment
found i n t h e product s p e c i f i c a t i o n :
U s e r s f i r s t n e e d a b r i e f i n t r o d u c t i o n t o what t h e
p r o d u c t does. S o m e t i m e s t h i s i s more for t h e b e n e -
f i t o f p e o p l e who h a v e n ' t yet bought t h e product,
and a r e j u s t l o o k i n g a t t h e manual. However, it
a l s o serves t o p u t t h e r e s t o f t h e m a n u a l , a n d t h e
p r o d u c t i t s e l f , i n t h e proper c o n t e x t .
.IP figure 10
is the name of a cataloged figure. If
a figure has not been cataloged, you need to use
the LOCATE command.
.IP f:p 1 0
is the scale of the
figure in relation to the page.
.IP font 10
is the two-character abbreviation or
full name of one of the available fonts
from the Alcuin library.
The following item list is produced:
I I
figure is the name of a cataloged figure. If a figure
has not been cataloged, you need to use the
LOCATE command.
7
figure is the name of a cataloged figure.
figure has not been cataloged, you need to
use the LOCATE command. I
You can specify an absolute or relative indent. To achieve the effect of a nested list,
you can use the .RS (you can think of this as either relative start or right shift) and
.RE (relative end or retreat) macros:
112 0 UNlX Text Processing 0
.IP font 10
is the two-character abbreviation or
full name of one of the available fonts
from the Alcuin library.
.RS
.IP cu
Cursive
.IF RS
Slanted
.RS
.IF LH 5 0
Left handed
.IP RH 5 0
Right handed
.RE
.IF BL
Block
.RE
The labels on the second level are aligned with the indented left margin of paragraphs
on the first level.
r
might produce the following:
.B bold
.I italic
.R roman
Each macro prints a single argument in a particular font. You might code a single sen-
tence as follows:
.B Alcuin
revitalizes an
.I age-old
tradition.
The printed sentence has one word in bold and one in italic.
beautiful
.R
handwriting;
You've already seen that the first argument is changed to the selected font. If you
supply a second argument, it is printed in the previous font. (You are limited to two
arguments, set off by a space; a phrase must be enclosed within quotation marks to be
taken as a single argument.) A good use for the alternate argument is to supply punc-
tuation, especially because of the restriction that you cannot begin a line with a period.
its opposite is
.B cacography .
This example produces:
I i t s opposite is cacography.
If the second argument is a word or phrase, you must supply the spacing:
The ink pen has been replaced by a
.I light " pen."
This produces:
I 1
I The ink pen has been replaced by a light pen. I
If you are using nro f f, specifying a bold font results in character overstrike; specify-
ing an italic font results in an underline for each character (not a continuous rule).
Overstriking and underlining can cause problems on some printers and terminals.
The chief advantage of these macros over the corresponding t r o f f constructs is
the ease of entry. It is easier to type:
.B calligraphy
than:
\fBcalligraphy\fP
However, you'll notice that using these macros changes the style of your input consider-
ably. As shown in the examples on the preceding pages, these macros require you to
code your input file using short lines that do not resemble the resulting filled output
text.
This style, which clearly divorces the form of the input from the form of the out-
put, is recommended by many nrof f and t rof f users. They recommend that you
use macros like these rather than inline codes, and that you begin each sentence or
clause on a new line. There are advantages in speed of editing. However, there are
others (one of the authors included) who find this style of input unreadable on the
screen, and prefer to use inline codes, and to keep the input file as readable as possible.
(There is no difference in the output file.)
116 UNIX Text Processing 0
Underlining
If you want to underline a single word, regardless of whether you are using nrof f or
t rof f, use the .UL macro:
the
.UL art
of calligraphy.
It will print a continuous rule beneath the word. You cannot specify more than a sin-
gle word with this macro.
At the top of a document, these settings will take effect immediately. Otherwise, you
must wait for the next paragraph macro for the new values to be recognized. If you
need both immediate and long-lasting effects, you may need a construct like:
.ps 8
.nr P S 8
.vs 12
.nr VS 12
There are also several macros for making local point size changes. The . L G macro
increases the current point size by 2 points; the .SM macro decreases the point size by
2 points. The new point size remains in effect until you change it. The .NL macro
changes the point size back to its default or normal setting. For example:
.LG
Alcuin
.NL
is a graphic arts product f o r
.SM
UNIX
.NL
systems.
0 ThemsMacros 0 117
I I
1 Alcuin is a graphic arts product for UNIX systems. I
The .LG and .S M macros simply increment or decrement the current point size
by 2 points. Because you change the point size relative to the current setting, repeating
a macro adds or subtracts 2 more points. If you are going to change the point size by
more than 2, it makes more sense to use the - p s request. The .NL macro uses the
value of the number register PS to reset the normal point size. Its default value is 10.
In the following example, the .p s request changes the point size to 12. The
.L G and .S M macros increase and decrease the point size relative to 12 points. The
.NL macro is not used until the end because it changes the point size back to 10.
.ps 12
.L G
Alcuin
- SM
is a graphic a r t s p r o d u c t for
.SM
UNIX
.L G
systems.
.NL
It produces the following line:
Displays
A document often includes material-such as tables, figures, or equations-that are not
a part of the running text, and must be kept together on the page. In ms and mm,such
document elements are referred to generically as displays.
The macros .D S , .DE, . I D , .CD, and .LD are used to handle displays in
ms. The display macros can be relied upon to provide
The default action of the .DS macro is to indent the block of text without filling lines:
Some of t h e typefaces that are currently available are:
.DS
Roman
Caslon
Baskerville
Helvet i ca
-DE
This produces:
Roman
Caslon
Baskerville
Helvetica
I Indented (default)
L Left-justified
C Center each line
B Block (center entire display)
.DS L
Dates Description of Task
2
3
4
5
.LD
Long Display
.DE
6
120 0 UNlX Text Processing 0
7
8
9
10
The following two formatted pages might be produced, assuming that there are a suffi-
cient number of lines to cause a page break:
-1- -2 -
Long Display
8
9
10
If there had been room on page 1 to fit the display, it would have been placed there, and
lines 6 and 7 would have followed the display, as they did in the input file.
If a static display had been specified in the previous example, the display would
be placed in the same position on the second page, and lines 6 and 7 would have fol-
lowed it, leaving extra space at the bottom of page 1. A floating display attempts to
make the best use of the available space on a page.
The formatter maintains a queue to hold floating displays that it has not yet out-
put. When the top of a page is encountered, the next display in the queue is output.
The queue is emptied in the order in which it was filled (first in, first out).
The macros called by the display macros to control output of a block of text are
available for other uses. They are known as “keep and release” macros. The pair
.K S / . KE keep a block together and output it on the next available page. The pair
.KF/. KE specify a floating keep; the block saved by the keep can float and lines of
text following the block may appear before it in the text.
Headings
In m s , you can have numbered and unnumbered headings. There are two heading
macros: .NH for numbered headings and .S H for unnumbered section headings.
Let’s first look at how to produce numbered headings. The syntax for the .Nfl
macro is:
.NH [level]
[heading text]
. LP
The ms Macros 0 121
(The brackets indicate optional arguments.) You can supply a numerical value indicat-
ing the level of the heading. If no value is provided for level, then a top-level heading
is assumed. The heading text begins on the line following the macro and can extend
over several lines. You have to use one of the paragraph macros, either .LP or - PP,
after the last line of the heading. For example:
.N H
Quick Tour of Alcuin
-LP
The result is a heading preceded by a first-level heading number:
r u i c k Tour of Alcuin 1
The next time you use this macro the heading number will be incremented to 2, and
after that, to 3.
You can add levels by specifying a numeric argument. A second-level heading is
indicated by 2:
.NH 2
Introduction t o Calligraphy
.LP
The first second-level heading number is printed:
1.1 Introduction t o Calligraphy
When another heading is specified at the same level, the heading number is automati-
cally incremented. If the next heading i s at the second level:
.NH 2
Digest of Alcuin Commands
.LP
m s produces:
L D D i g e s t o f Alcuin Commands
Each time you go to a new level, . 1 is appended to the number representing the exist-
ing level. That number is incremented for each call at the same level. When you back
out of a level (for instance, when you go from level 5 to 4) the counter for the level (in
this case level 5) is reset to 0.
The macro for unnumbered headings is SH:
.SH
Introduction t o Calligraphy
-LP
Unnumbered headings and numbered headings can be intermixed without affecting the
numbering scheme:
122 0 UNIX Text Processing 0
1. Quick Tour of A l c u i n
Introduction to Calligraphy
1.1 D i g e s t o f A l c u i n Commands
Headings are visible keys to your document’s structure. Their appearance can
contribute significantly to a reader recognizing that organization. If you are using
unnumbered headings, it becomes even more important to make headings stand out. A
simple thing you can do is use uppercase letters for a first-level heading.
.T L Title
.AU Author
.AI Author’s Institution
.AB Abstract Start
.AE Abstract End
These macros are general enough that you can still use them even if you aren’t from
Bell Laboratories.
Each macro takes its data from the following line(s) rather than from an argument.
They are typically used together. For example:
- TL
UNIX Text P r o c e s s i n g
- AU
Dale D o u g h e r t y
- AU
Tim O‘Reilly
0 The ms Macros 0 123
.AI
O’Reilly & Associates, Inc.
.AB
This book provides a comprehensive introduction t o t h e major
UNIX text-processing tools. It includes a discussion of
vi, ex, nroff, and troff, as
well as many other text-processing programs.
.AE
-LP
Exactly how the output will look depends on which document types you have selected.
If you don’t specify any of the formats, you will get something like this:
Dale Dougherty
Tim 0 ’Reilly
O’Reilly & Associates, Inc.
ABSTRACT
This book provides a comprehensive introduction to
the major UNIX text-processing tools. It includes a
discussion of v i , e x , nroff, and troff, as
well as many other text-processing programs.
You can specify as many title lines as you want following .TL. The macro will be
terminated by any of the other cover sheet macros, or by any paragraph macro. For
multiple authors, .A U and .A I can be repeated up to nine times.
The cover sheet isn’t actually printed until a reset (such as that caused by any of
the paragraph macros) is encountered, so if you want to print only a cover page, you
should conclude it with a paragraph macro even if there i s no following text.
In addition, if you use these macros without one of the overall document type
macros like .RP, the cover sheet will not be printed separately. instead, the text will
immediately follow. insert a .bp if you want a separate cover sheet.
Miscellaneous Features
T o m o v e t o t h e n e x t menu, press t h e
.BX RETURN
key -
This draws a box around the word RETURN.
T o m o v e t o t h e n e x t menu, press t h e
IRETURN I
key.
As you can see, it might be a good idea to reduce the point size of the boxed word.
You can enclose a block of material within a box by using the pair of macros
.B1 and .B2:
.B1
.B
.ce
Note t o Reviewers
.R
- LP
Can y o u g e t a copy o f a m a n u s c r i p t w i t h o u t a n n o t a t i o n s ?
It seems t o m e t h a t you s h o u l d be
a b l e t o mark up a page w i t h comments o r
o t h e r scribbles w h i l e i n A n n o t a t i o n M o d e and
s t i l l o b t a i n a p r i n t e d copy w i t h o u t t h e s e m a r k s .
Any i d e a s ?
- SP
.B 2
This example produces the following boxed section in t ro f f:
Note to Reviewers
Can you get a copy of a manuscript without annotations? It seems to me that you
should be able to mark up a page with comments or other scribbles while in Annota-
tion Mode and still obtain a printed copy without these marks. Any ideas?
You may want to place boxed information inside a pair of keep or display macros. This
will prevent the box macro from breaking if it crosses a page boundary. If you use
these macros with n r o f f, you must also pipe your output through the c o l postpro-
cessor as described in Chapter 4.
Footnotes
Footnotes present special problems-the main is printing the text at the bottom of the
page. The .F S macro indicates the start of the text for the footnote, and .F E indi-
cates the end of the text for the footnote. These macros surround the footnote text that
will appear at the bottom of the page. The .F S macro i s put on the line immediately
following some kind of marker, such as an asterisk, that you supply in the text and in
the footnote.
0 Thems Macros a 125
All the footnotes are collected and output at the bottom of each page underneath a short
rule. The footnote text is printed in smaller type, with a slightly shorter line length then
the body text. However, you can change these if you want.
Footnotes in m s use an nrof f /trof f feature called environments (see
Chapter 14), so that parameters like line length or font that are set inside a footnote are
saved independently of the body text. So, for example, if you issued the requests:
.F S
.ft B
-11 -5n
.in +5n
Some text
-
-
-
.FE
the text within the footnote would be printed in boldface, with a 5-en indent, and the
line length would be shortened by 5 ens. The text following the footnote would be
unaffected by those formatting requests. However, the next time a footnote was called,
that special formatting would again be in effect.
I I
*"Publish or Perish: Start-up grabs early page language
lead," Computerworld, April 21, 1986, p. 1.
If a footnote is too long to fit on one page, it will be continued at the bottom of the next
page.
Two-Column Processing
One of the nice features of the m s macros is the ease with which you can create multi-
ple columns and format documents, such as newsletters or data sheets, that are best
suited to a multicolumn format.
To switch to two-column mode, simply insert the .2 C macro. To return to
single-column mode, use . 1 C . Because of the way two-column processing works in
m s , you can switch to two-column mode in the middle of a page, but switching back to
a single column forces a page break. (You'll understand the reason for this when we
return to two-column processing in Chapter 16.)
The default column width for two-column processing i s 7/15th of the line length.
It is stored in the register CW (column width). The gutter between the columns is
126 0 UNlX Text Processing 0
1/15th of the line length, and is stored in the register GW (gutter width). By changing
the values in these registers, you can change the column and gutter width.
For more than two columns, you can use the .MC macro. This macro takes two
arguments, the column width and the gutter width, and creates as many columns as will
fit in the line length. For example, if the line lengths are 7 inches, the request:
.MC 2i .3i
would create three columns 2 inches wide, with a gutter of .3 inches between the
columns.
Again, .1 C can be used to return to single-column mode. In some versions of
m s , the . R C macro can be used to break columns. If you are in the left column, fol-
lowing text will go to the top of the next column. If you are in the right column, .RC
will start a new page.
.ds LH GGS
.ds CH Alcuin Project P r o p o s a l
.ds RH \ * ( D Y
.ds CF P a g e %
You may notice that we use the string DY to supply today’s date in the header. In the
footer, we use a special symbol (%) to access the current page number. Here are the
resulting header and footer:
Alcuin Project P r o p o s a l
Page 2 1
April 26, 1987
Normally, you would define the header and footer strings at the start of the document,
so they would take effect throughout. However, note that there is nothing to prevent
you from changing one or more of them from page to page. (Changes to a footer string
0 Thems Macros 0 127
will take effect on the same page; changes to a header string will take effect at the top
of the next page.)
Extensions toms
In many ways, m s can be used to give you a head start on defining your own macro
package. Many of the features that are missing in m s can be supplied by user-defined
macros. Many of these features are covered in Chapters 14 through 18, where, for
example, we show macros for formatting numbered lists.
*This problem actually can occur on any page, but is most frequently encountered on the first page.
C H A P T E R
The mm Macros
A macro package provides a way of describing the format of various kinds of docu-
ments. Each document presents its own specific problems, and macros help to provide
a simple and flexible solution. The mm macro package is designed to help you format
letters, proposals, memos, technical papers, and reports. A text file that contains mm
macros can be processed by either n r o f f or t rof f, the two text formatting pro-
grams in UNIX. The output from these programs can be displayed on a terminal screen
or printed on a line printer, a laser printer, or a typesetter.
Some users of the mm macro package learn only a few macros and work produc-
tively. Others choose from a variety of macros to produce a number of different for-
mats. More advanced users modify the macro definitions and extend the capabilities of
the package by defining their own special-purpose macros.
Macros are the words that make up a format description language. Like words,
the result of a macro is often determined by context. That is, you may not always
understand your output by looking up an individual macro, just like you may not under-
stand the meaning of an entire sentence by looking up a particular word. Without exa-
mining the macro definition, you may find it hard to figure out which macro is causing
a particular result. Macros are interrelated; some macros call other macros, like a sub-
routine in a program, to perform a particular function.
After finding out what the macro package allows you to do, you will probably
decide upon a particular format that you like (or one that has evolved according to the
decisions of a group of people). To describe that format, you are likely to use only a
few of the macros, those that do the job. In everyday use, you want to minimize the
number of codes you need to format documents in a consistent manner.
128
ThemMacros 0 129
If you are using o t r o f f , be sure you don’t let t r o f f send the output to your
terminal because, in all probability, it will cause your terminal to hang, or at least to
scream and holler.
In this chapter, we will generally show the results of the mm command, rather
than mmt-that is, we’ll be showing n r o f f rather than t r o f f . Where the subject
under discussion is better demonstrated by t r o f f , we will show t r o f f output
instead. We assume that by now, you will be able to tell which of the programs has
been used, without our mentioning the actual commands.
Sometimes, you won’t get error messages, but your output will break midway. Gen-
erally, you have to go in the file at the point where it broke, or before that point, and
examine the macros or a sequence of macros. You can also run a program on the input
file to examine the code you have entered. This program, available at most sites, is
called checkmm.
Default Formatting
In Chapter 4, we looked at a sample letter formatted by n r o f f . It might be interest-
ing, before putting any macros in the file, to see what happens if we format l e t t e r
as it is, this time using the mm command to read in the mm macro package.
Refer to Figure 6-1 and note that
- 1 -
April 1 , 1987
Sincerely,
Fred Caslon
Page Layout
When you format a page with mm, the formatter is instructed to provide several lines at
the top and the bottom of the page for a header and a footer. By default, a page number
appears on a single line in the header and only blank lines are printed for the footer.
There are basically two different ways to change the default header and footer.
The first way is to specify a command-line parameter with the mm or m m t commands
to set the number register N. This allows you to affect how pages are numbered and
where the page number appears. The second way is to specify in the input file a macro
that places text in the header or footer. Let’s look at both of these techniques.
The other type of change affects whether or not the page number is printed in the
header at the top of the first page.
The number register N controls these actions. This register has a default setting
of 0 and can take values from 0 through 5. Table 6-1 shows the effect of these values.
Value Action
0 The page number prints in the header on all pages.
This is the default page numbering style.
1 On page 1 , the page number is printed in place of
the footer.
2 On page 1 , the page number in not printed.
3 All pages are numbered by section, and the page
number appears in the footer. This setting affects
the defaults of several section-related registers and
macros. It causes a page break for a top-level head-
ing (E j=l), and invokes both the .F D and .RP
macros to reset footnote and reference numbering.
0 ThemmMacros 0 133
Value Action
4 The default header containing the page number is
suppressed, but it has no effect on a header supplied
by a page header macro.
5 All pages are numbered by section, and the page
number appears in the footer. In addition, labeled
displays (.FC, .TB, .EX, and .EC) are also
numbered by section.
The register N can be set from the command line using the -r option. If we set
it to 2, no page number will appear at the top of page 1 when we print the sample letter:
$ ram -rN2 letter I lp
In the footer, we use a special symbol (%)to access the current page number. Only text
to be centered was specified; however, the four delimiters were still required to place
the text correctly. This footer appears at the bottom of the page:
134 0 UNlX Text Processing 0
Page 2
The header and footer macros override the default header and footer.
1
Setting Other Page Control Registers
The mm package uses number registers to supply the values that control line length,
page offset, point size, and page length, as shown in Table 6-2.
These registers must be defined before the mm macro package is read by nrof f
or t r o f f . Thus, they can be set from the command line using the -r option, as we
showed when we gave a new value for register N. Values of registers 0 and W for
n r o f f must be given in character positions (depending on the character size of the
output device for nrof f , S i might translate as either 5 or 6 character positions), but
t r o f f can accept any of the units descibed in Chapter 4. For example:
$ mm -rN2 -rW65 -r10 file
but:
$ rnmt -rN2 -rW6.5i -rOli file
Or the page control registers can be set at the top of your file, using the .s o request to
read in the m m macro package, as follows:
.nr N 2
.nr W 65
.nr 0 10
.so /usr/lib/tmac/tmac.m
Paragraphs
The .P macro marks the beginning of a paragraph.
.P
In our conversation last Thursday, we discussed a
This macro produces a left-justified, block paragraph. A blank line in the input file also
results in a left-justified, block paragraph, as you saw when we formatted an uncoded
file.
However, the paragraph macro controls a number of actions in the formatter,
many of which can be changed by overriding the default values of several number regis-
ters. The .P macro takes a numeric argument that overrides the default paragraph
type, which is a block paragraph. Specifying 1 results in an indented paragraph:
.P 1
Going through a demo session gave me a much better
The first three paragraphs formatted for the screen follow:
The first line of the second paragraph is indented five spaces. (In t r o f f the default
indent is three ens.) Notice that the paragraph type specification changes only the
second paragraph. The third paragraph, which is preceded in the input file by .P
without an argument, i s a block paragraph.
If you want to create a document in which all the paragraphs are indented, you
can change the number register that specifies the default paragraph type. The value of
P t is 0 by default, producing block paragraphs. For indented paragraphs, set the value
of P t to 1. Now the .P macro will produce indented paragraphs.
.nr Pt 1
If you want to obtain a block paragraph after you have changed the default type,
specify an argument of 0:
136 0 UNlX Text Processing 0
.P 0
When you specify a type argument, it overrides whatever paragraph type is in effect.
There is a third paragraph type that produces an indented paragraph with some
exceptions. If Pt is set to 2, paragraphs are indented except those following section
headings, lists, and displays. It is the paragraph type used in this book.
The following list summarizes the three default paragraph types:
0 Block
1 Indented
2 Indented with exceptions
Vertical Spacing
The paragraph macro also controls the spacing between paragraphs. The amount of
space is specified in the number register P s. This amount differs between nrof f
and troff.
With nrof f, the .P macro has the same effect as a blank line, producing a full
space between paragraphs. However, with t r o f f , the .P macro outputs a blank
space that i s equal to one-half of the current vertical spacing setting. Basically, this
means that a blank line will cause one full space to be output, and the .P macro will
output half that space.
The P macro invokes the .SP macro for vertical spacing. This macro take a
numeric argument requesting that many lines of space.
Sincerely,
.SP 3
F r e d Caslon
Three lines of space will be provided between the salutation and the signature lines.
You do not achieve the same effect if you enter - SP macros on three consecu-
tive lines. The vertical space does not accumulate and one line of space is output, not
three.
Two or more consecutive .SP macros with numeric arguments results in the
spacing specified by the greatest argument. The other arguments are ignored.
.SP 5
.SP
.SP 2
In this example, five lines are output, not eight.
Because the .P macro calls the .SP macro, it means that two or more consecu-
tive paragraph macros will have the same effect as one.
0 ThemmMacros 0 137
The argument specified with the .SP macro cannot be scaled nor can it be a
negative number. The .SP macro automatically works in the scale (v) of the current
vertical spacing. However, both .SP and .s p accept fractions, so that each of the
following codes has the same result:
.sp . 3 v .SP . 3 .sp . 3
Justification .
A document formatted by n r o f f with mm produces, by default, unjustified text (an
uneven or ragged-right margin). When formatted by t r o f f , the same document is
automatically justified (the right margin is even).
If you are using both n r o f f and t r o f f , it is probably a good idea to expli-
citIy set justification on or off rather than depend upon the default chosen by the for-
matter. Use the .S A macro (set adjustment) to set document-wide justification. An
argument of 0 specifies no justification; 1 specifies justification.
If you insert this macro at the top of your file:
.SA 1
Word Hyphenation
One way to achieve better line breaks and more evenly filled lines is to instruct the for-
matter to perform word hyphenation.
Hyphenation is turned off in the m m macro package. This means that the for-
matter does not try to hyphenate words to make them fit on a line unless you request it
by setting the number register H y to 1. If you want the formatter to automatically
hyphenate words, insert the following line at the top of your file:
138 UNlX Text Processing 0
.nr Hy 1
Most of the time, the formatter breaks up a word correctly when hyphenating. Some-
times, however, it does not and you have to explicitly tell the formatter either how to
split a word (using the .h y request) or not to hyphenate at all (using the .nh
request).
Displays
When we format a text file, the line breaks caused by carriage returns are ignored by
n r o f f / t r o f f . How text is entered on lines in the input file does not affect how
lines are formed in the output. It doesn’t really matter whether information is typed on
three lines or four; it appears the same after formatting.
You probably noticed that the name and address at the beginning of our sample
file did not come out in block form. The four lines of input ran together and produced
two filled lines of output:
Mr. John Fust Vice President, Research and Development
Gutenberg Galaxy Software Waltham, Massachusetts 02159
The formatter, instead of paying attention to carriage returns, acts on specific macros or
requests that cause a break, such as .P, .SP, or a blank line. The formatter request
b r is probably the simplest way to break a line:
Mr. John Fust
.br
Vice President, Research and Development
The .b r request is most appropriate when you are forcing a break of a single line.
For larger blocks of text, the mrn macro package provides a pair of macros for indicat-
ing that a block of text should be output just as it was entered in the input file. The
.DS (display start) macro is placed at the start of the text, and the . D E (display end)
macro is placed at the end:
.DS
Mr. John Fust
Vice President, Research and Development
Gutenberg Galaxy Software
Waltham, Massachusetts 02159
-DE
The formatter does not fill these lines, so the address block is output on four lines, just
as it was typed. In addition, the . D E macro provides a line of space following the
display.
.nr Pt 1
.SA 1
April 1, 1987
.SP 2
.DS
Mr. John Fust
Vice President, Research and Development
Gutenberg Galaxy Software
Waltham, Massachusetts 02159
.DE
Dear Mr. Fust:
.P
In our conversation last Thursday, we discussed a
documentation project that would produce a user's manual
on the Alcuin product. Yesterday, I received the product
demo and other materials that you sent me.
.P
Going through a demo session gave me a much better
understanding of the product. I confess to being amazed
by Alcuin. Some people around here, looking over my
shoulder, were also astounded by the illustrated
manuscript I produced with Alcuin. One person, a student
of calligraphy, was really impressed.
.P
In the next couple of days, I'll be putting together a
written plan that presents different strategies f o r
documenting the Alcuin product. After I submit this plan,
and you have had time to review it, let's arrange a
meeting at your company to discuss these strategies.
.P
Thanks again f o r giving us the opportunity to bid on this
documentation project. I hope we can decide upon a
strategy and get started as soon as possible in order to
have the manual ready in time for the first customer
shipment. I look forward to meeting with you towards the
end of next week.
.SP
Sincerely,
.SP 2
Fred Caslon
- 1 -
April 1, 1987
Sincerely,
Fred Caslon
We have worked through some of the problems presented by a very simple one-
page letter. A s we move on, we will be describing specialized macros that address the
problems of multiple page documents, such as proposals and reports. In many ways,
the macros for more complex documents are the feature performers in a macro package,
the ones that really convince you that a markup language is worth learning.
When you format with n r o f f and print on a line printer, you can put emphasis on
individual words or phrases by underlining or overstriking. When you are using
t r o f f and send your output to a laser printer or typesetter, you can specify variations
of type, font, and point size based on the capabilities of the output device.
.B Bold
.I Italic
.R Roman
Each macro prints a single argument in a particular font. You might code a single sen-
tence as follows:
.B Alcuin
revitalizes an
.I age-old
tradition.
The printed sentence has a word in bold and one in italic. (In nroff, bold space is
simulated by overstriking, and italics by underlining.)
beaut i fu 1
.R
handwriting;
You've already seen that the first argument is changed to the selected font. If you
supply a second argument, it is printed in the previous font. Each macro takes up to six
arguments for alternating font changes. (An argument is set off by a space; a phrase
must be enclosed within quotation marks to be taken as a single argument.) A good use
for the alternate argument is to supply punctuation, especially because of the restriction
that you cannot begin an input line with a period.
its opposite is
.B cacography .
This example produces:
This produces:
This produces:
Alcuin uses three input devices, a light pen, a mouse, and a graphics tablet.
0 ThemmMacros 0 143
There are additional macros for selecting other main and alternate fonts. These macros
also take up to six arguments, displayed in alternate fonts:
If you are using n r o f f , specifying a bold font results in character overstrike; specify-
ing an italic font results in an underline for each character (not a continuous rule).
Overstriking and underlining can cause problems on some printers and terminals.
or absolute ones:
.s 12 1 4
By default, if you don’t specify vertical spacing, a relation of 2 points greater than the
point size will be maintained. A null value (””) does not change the current setting.
The new point size and vertical spacing remain in effect until you change them.
Simply entering the .S macro without arguments restores the previous settings:
.s
The mm package keeps track of the default, previous, and current values, making it
easy to switch between different settings using one of these three arguments:
D Default
P Previous
C Current
144 0 UNIX Text Processing 0
In the following example for a letterhead, the company name is specified in 18-point
type and a tag line in 12-point type; then the default settings are restored:
.S 18
Caslon Inc.
.s 12
Communicating Expertise
.S D
1I
r
Caslon Inc.
Communicating Expertise
You can also change the font along with the point size, using the 1 macro described
previously. Following is the tag line in 12-point italic.
=Communicating Expertise
A special-purpose macro in mm reduces by 1 point the point size of a specified string.
The .SM macro can be followed by one, two, or three strings. Only one argument is
reduced; which one depends upon how many arguments are given. If you specify one
or two arguments, the first argument will be reduced by 1 point:
using
.SM UNIX ,
you will find
The second argument is concatenated to the first argument, so that the comma immedi-
ately follows the word UNIX:
I
More about Displays
Broadly speaking, a display is any kind of information in the body of a document that
cannot be set as a normal paragraph. Displays can be figures, quotations, examples,
tables, lists, equations, or diagrams.
The display macros position the display on the page. Inside the display, you
might use other macros or preprocessors such as t b l or eqn. You might simply
have a block of text that deserves special treatment.
The display macros can be relied upon to provide
The default action of the DS macro is to left justify the text block in no-fill mode. It
provides no indentation from the current margins.
You can specify a different format for a display by specifying up to three argu-
ments with the .DS macro. The syntax is:
O L N o indent (default)
1 1 Indented
2 c Center each line
3 CB Center entire display
For consistency, the indent of displays is initially set to be the same as indented para-
graphs (five spaces in n r o f f and three ens in t r o f f ) , although these values are
maintained independently in two different number registers, P i and Si. (To change
the defaults, simply use the .nr request to put the desired value in the appropriate
register.)
A display can be centered in two ways: either each individual line in the display
is centered (C) or the entire display is centered as a block based on the longest line of
the display (CB).
For instance, the preceding list was formatted using t b l , but its placement was
controlled by the display macro.
146 UNlX Text Processing 0
.DS CB
.TS
table specifications
- TE
.DE
Thefill mode argument is represented by either a number or a letter
The right indent argument is a numeric value that is subtracted from the right
margin. In nrof f, this value is automatically scaled in ens. In t r o f f, you can
specify a scaled number; otherwise, the default is ems.
The use of fill mode, along with other indented display options, can provide a
paragraph indented on both sides. This is often used in reports and proposals that quote
at length from another source. For example:
.P
I was particularly interested in the following comment
found in the product specification:
.DS I F 5
Users first need a brief introduction to what the product
does. Sometimes this is more for the benefit of people
who haven't yet bought the product, and
are just looking at the manual.
However, it also serves to put the rest of
the manual, and the product itself, in the proper context.
-DE
The result of formatting is:
The use of tabs often presents a problem outside of displays. Material that has
been entered with tabs in the input file should be formatted in no-fill mode, the default
setting of the display macros. The following table was designed using tabs to provide
the spacing:
0 ThemmMacros 0 147
.DF I
Dates Description of Task
This table appears in the output just as it looks in the file. If this material had not been
processed inside a display in no-fill mode, the columns would be improperly aligned.
3
4
5
.DF
Long Displuy
.DE
6
7
8
9
10
The following two formatted pages might be produced, assuming that there are a suffi-
cient number of lines in the display to cause a page break:
148 0 UNlX Text Processing 0
-1- -2-
Long Display
8
9
10
If there had been room on page 1 to fit the display, it would have been placed there, and
lines 6 and 7 would have followed the display, as they did in the input file.
If a static display had been specified, the display would be placed in the same
position on page 2, and lines 6 and 7 would have to follow it, leaving extra space at the
bottom of page 1 . A floating display attempts to make the best use of the available
space on a page.
The formatter maintains a queue to hold floating displays that it has not yet out-
put. When the top of a page is encountered, the next display in the queue is output.
The queue is emptied in the order in which it was filled, (first in, first out). Two
number registers, D e and D f , allow you to control when displays are removed from
the queue and placed in position.
At the end of a section, as indicated by the section macros - H and .HU (which
we will see shortly), or at the end of the input file, any floating displays that remain in
the queue will be placed in the document.
Display Labels
You can provide a title or caption for tables, equations, exhibits, and figures. In addi-
tion, the display can be labeled and numbered in sequence, as well as printed in a table
of contents at the end of the file. The following group of macros are available:
.EC Equation
.EX Exhibit
.FG Figure
All of these macros work the same way and are usually specified within a pair of
.O S / . DE macros, so that the title and the display appear on the same page. Each
macro can be followed by a title. If the title contains spaces, it should be enclosed
within quotation marks. The title of a table usually appears at the top of a table, so it
must be specified before the .TS macro that signals to t b l the presence of a table
(see Chapter 8).
0 ThemmMacros 149
I 1
I Figure 1. Drawing with a Light Pen I
The default format of the label can be changed slightly by setting the number
register O f to 1. This replaces the period with a dash.
I 1
I Figure 1 - Drawing with a Light Pen I
Second and third arguments, specified with the label macros, can be used to
modify or override the default numbering of displays. Basically, the second argument
is a literal and the third argument a numeric value that specifies what the literal means.
If the third argument is
Thus, a pair of related tables could be specified as l a and l b using the following labels:
.TB "Estimated Hours: June, July, and August" a 1
.TB "Estimated Hours: September and November," lb 2
(These labels show two different uses of the third argument. Usually, you would con-
sistently use one technique or the other for a given set of tables.)
For tbl, the delimiters for tables are - TS/.TE. For e q n , the delimiters for
equations are .EQ/.EN. For p i c , the delimiters for pictures or diagrams are
.PS/. PE. These pairs of delimiters indicate a block to be processed by a specific
150 0 UNlX Text Processing 0
preprocessor. You will find the information about each of the preprocessors in Chapters
8 through 10. A s mentioned, the preprocessor creates the display, the display macros
position it, and the label macros add titles and a number.
Although it may seem a minor point, each of these steps is independent, and
because they are not fully integrated, there is some overlap.
The label macros, being independent of the preprocessors, do not make sure that a
display exists or check whether a table has been created with tbl. You can create a
two-column table using tabs or create a figure using character symbols and still give it a
label. Or you can create a table heading as the first line of your table and let t b l pro-
cess it (tbl won’t provide a number and the table won’t be collected for the table of
contents).
In t bl, you can specify a centered table and not use the .DS/.DE macros.
But, as a consequence, nrof f / trof f won’t make a very good attempt at keeping the
table together on one page, and you may have to manually break the page. It is recom-
mended that you use the display macros throughout a document, regardless of whether
you can get the same effect another way, because if nothing else you will achieve con-
sistency.
Formatting Lists
The mm macro package provides a variety of different formats for presenting a list of
items. You can select from four standard list types:
bulleted
dashed
numbered
alphabetized
0 ThemmMacros 0 151
In addition, you have the flexibility to create lists with nonstandard marks or text labels.
The list macros can also be used to produce paragraphs with a hanging indent.
Each list item consists of a special mark, letter, number, or label in a left-hand
column with a paragraph of text indented in a right-hand column.
Structuring a List
The list macros help to simplify what could be a much larger and tedious formatting
task. Here’s the coding for the bulleted list just shown:
.BL
*LI
bulleted
dashed
.LI
numbered
.LI
alphabetized
.LE
The structure of text in the input file has three parts: a list-initialization macro (. BL),
an item-mark macro (. LI), and a list-end macro ( - LE).
First, you initialize the list, specifying the particular macro for the type of list that
you want. For instance, BL initializes a bulleted list.
You can specify arguments with the list-initialization macro that change the
indentation of the text and turn off the automatic spacing between items in the list. We
will examine these arguments when we look at the list-initialization macros in more
detail later.
Next, you specify each of the items in the list. The item-mark macro, .LI, is
placed before each item. You can enter one or more lines of text following the macro.
.B L
.LI
Item 1
.LI
Item 2
.LI
Item 3
When the list is formatted, the .LI macro provides a line of space before each item.
(This line can be omitted through an argument to the list-initialization macro if you
want to produce a more compact list. We’ll be talking more about this in a moment.)
The .LI macro can also be used to override or prefix the current mark. If a
mark is supplied as the only argument, it replaces the current mark. For example:
152 0 UNlX Text Processing 0
.LI 0
Item 4
would produce:
r* Item 5 1
A text label can also be supplied in place of the mark, but it presents some addi-
tional problems for the proper alignment of the list. We will look at text labels for
variable-item lists.
The .LI macro does not automatically provide spacing after each list item. An
argument of 1 can be specified if a line of space is desired.
The end of the list is marked by the list-end macro .LE. It restores page format-
ting settings that were in effect prior to the invocation of the last list-initialization
macro. The . L E macro does not output any space following the list unless you
specify an argument of 1. (Don’t specify this argument when the list is immediately
followed by a macro that outputs space, such as the paragraph macro.)
Be sure you are familiar with the basic structure of a list. A common problem is
not closing the list with .LE. Most of the time, this error causes the formatter to quit
at this point in the file. A less serious, but nonetheless frequent, oversight is omitting
the first .LI between the list-initialization macro and the first item in the list. The list
is output but the first item will be askew.
Here is a sample list:
- BL
.LI
Item 1
Item 2
.LI
Item 3
.LI 0
Item 4
.LI - 1
Item 5
.LE
0 ThemmMacros 0 153
Item 1
Item 2
Item 3
o Item 4
-* Item 5
Complete list structures can be nested within other lists up to six levels. Different
types of lists can be nested, making it possible to produce indented outline structures.
3ut, like nested if-then structures in a program, make sure you know which level you
are at and remember to close each list.
For instance, we could nest the bulleted list inside a numbered list. The list-
initialization macro .AL generates alphabetized and numbered lists.
.A L
.LI
Don't worry, we'll get t o the list-initialization macro .AL.
You can specify five different variations of
alphabetic and numbered lists.
.BL
.LI
Item 1
.LI
Item 2
.
LI
Item 3
.LE
.
LI
We'll also look at variable-item lists.
.LE
Item 1
Item 2
Item 3
You may already realize the ease with which you can make changes to a list. The
items in a list can be easily put in a new order. New items can be added to a numbered
list without readjusting the numbering scheme. A bulleted list can be changed to an
alphabetized list by simply changing the list-initialization macro. And you normally
don’t have to be concerned with a variety of specific formatting requests, such as setting
indentation levels or specifying spacing between items.
On the other hand, because the structure of the list is not as easy to recognize in
the input file as it is in the formatted output, you may find it difficult to interpret com-
plicated lists, in particular ones that have been nested to several levels. The code-
checking program, checkmm, can help; in addition, you may want to format and print
repeatedly to examine and correct problems with lists.
Marked Lists
Long a standby of technical documents, a marked list clearly organizes a group of
related items and sets them apart for easy reading. A list of items marked by a bullet
( 0 ) is perhaps the most common type of list. Another type of marked list uses a dash
(-). A third type of list allows the user to specify a mark, such as a square ( ). The
list-initialization macros for these lists are:
With the .BL macro, the text is indented the same amount as the first line of an
indented paragraph. A single space is maintained between the bullet and the text. The
bullet is right justified, causing an indent of several spaces from the left margin.
As you can see from this n r o f f-formatted output, the bullet is simulated in
n r o f f by a + overstriking an 0 :
0 ThemmMacros 0 155
I I
cb G G S Technical Memo 3 2 0 0
6 G G S Product Marketing Spec
cb Alcuin/UNIX interface definition
cb Programmer's documentation for Alcuin
Because the bullets produced by n r o f f are not always appropriate due to the
overstriking, a dashed list provides a suitable alternative. With the ,DL macro, the
dash is placed in the same position as a bullet in a bulleted list. A single space is main-
tained between the dash and the text, which, like the text with a bulleted list, is indented
r
by the amount specified in the number register for indented paragraphs (Pi).
The n r o f f formatter supplies a dash that is a single hyphen, and t r o f f sup-
plies an em dash. Because the em dash is longer, and the dash is right justified, the
alignment with the left margin is noticeably different. It appears left justified in
t r o f f ; in n r o f f,the dash appears indented several spaces because it is smaller.
- Loading a Font
- Scaling a Font
You can specify a text indent and a second argument of 1 to inhibit spacing between
items.
156 0 UNlX Text Processing 0
With the .ML macro, you have to supply the mark for the list. Some possible
candidates are the square (enter \ (sq to get o),the square root (enter \ (sr to get
d), which resembles a check mark, and the gradient symbol (enter \ ( g r to get v).
The user-specified mark is the first argument.
.ML \(sq
Not all of the characters or symbols that you can use in t r o f f will have the same
effect in nroff.
r
Unlike bulleted and dashed lists, text is not automatically indented after a user-
specified mark. However, a space is added after the mark. The following example of
an indented paragraph and a list, which specifies a square as a mark, has been formatted
using n r o f f . The square appears as a pair of brackets.
The user-supplied mark can be followed by a second argument that specifies a text
indent and a third argument of 1 to omit spacing between items.
The following example was produced using the list-initialization command:
.ML \ ( s q 5 1
I
The specified indent of 5 aligns the text with an indented paragraph:
If no arguments are specified, the .A L macro produces a numbered list. For instance,
we can code the following paragraph with the list-initialization macro .AL:
0 ThemMacros 0 157
You can produce various list types by simply changing the type argument. You can
create a very useful outline format by nesting different types of lists. The example we
show of such an outline is one that is nested to four levels using I, A, 1 , and a, in
that order. The rather complicated looking input file is shown in Figure 6-4 (indented
for easier viewing of each list, although it could not be formatted this way), and the
n K O f f -formatted output is shown in Figure 6-5.
Another list-initialization macro that produces a numbered list is RL (reference
list). The only difference i s that the reference number is surrounded by brackets ([I).
The arguments have the same effect as those specified with the .AL macro. To initial-
ize a reference list with no spacing between items, use:
.RL "" 1
Variable-Item Lists
With a variable-item list, you do not supply a mark; instead, you specify a text label
with each .L I . One or more lines of text following .LI are used to form a block
paragraph indented from the label. If no label is specified, a paragraph with a hanging
indent is produced. The syntax is:
ThemMacros 0 159
.AL I
- LI
Quick Tour of Alcuin
.AL A
.LI
Introduction to Calligraphy
- LI
Digest of Alcuin Commands
.AL 1
.LI
Three Methods of Command Entry
.AL a
- LI
Mouse
- LI
Keyboard
.LI
Light Pen
LE
.LI
Starting a Page
- LI
Drawing Characters
.AL a
.LI
Choosing a Font
.LI
Switching Fonts
.LE
.LI
Creating Figures
.LI
Printing
.LE
.LI
Sample Illuminated Manuscripts
.LE
.LI
Using Graphic Characters
.AL A
- LI
Modifying Font Style
.LI
Drawing Your Own Font
.LE
* LI
- 1 -
A. Introduction to Calligraphy
a. Mouse
b. Keyboard
c. Light Pen
2. Starting a Page
3. Drawing Characters
a. Choosing a Font
b. Switching Fonts
4. Creating Figures
5. Printing
get a paragraph with a hanging indent. If you want to print an item without a label,
specify a backslash followed by a space ( \ ) or \ 0 after .LI. Similarly, if you want
to specify a label that contains a space, you should also precede the space with a
backslash and enclose the label within quotation marks:
.LI "point\ size"
or simply substitute a \ 0 for a space:
. L I point\Osize
The first line of text is left justified (or indented by the amount specified in label
indent) and the remaining lines will be indented by the amount specified by text indent.
This produces a paragraph with a hanging indent:
162 0 UNlX Text Processing 0
.VL 15
-LI
There are currently 16 font dictionaries in t h e Alcuin
library. Any application may have u p t o 12 dictionaries
resident in memory at t h e same time.
.LE
When formatted, this item has a hanging indent of 15:
Headings
Earlier we used the list macros to produce an indented outline. That outline, indented
to four levels, is a visual representation of the structure of a document. Headings per-
form a related function, showing how the document is organized into sections and sub-
sections. In technical documentation and book-length manuscripts, having a structure
that i s easily recognized by the reader is very important.
The simplest use of the .H macro is to specify the level as a number between 1 and 7
followed by the text that is printed as a heading. If the heading text contains spaces,
you should enclose it within quotation marks. A heading that is longer than a single
line will be wrapped on to the next line. A multiline heading will be kept together in
case of a page break.
If you specify a heading suffx, this text or mark will appear in the heading but
will not be collected for a table of contents.
A top-level heading is indicated by an argument of 1:
.H 1 "Quick Tour of Alcuin"
ThemMacros 0 163
The result is a heading preceded by a heading-level number. The first-level heading has
the number 1.
1. Quick Tour of Alcuin
When another heading is specified at the same level, the heading-level number is
automatically incremented. If the next heading is at the second level:
.H 2 "Digest of Alcuin Commands"
it produces:
1.2 Digest of Alcuin Commands
Each time you go to a new (higher-numbered) level, .1 is appended to the number
representing the existing level. That number i s incremented for each call at the same
level. When you back out of a level (for instance, from level 5 to 4), the counter for the
level (in this case level 5), is reset to 0.
An unnumbered heading is really a zero-level heading:
.H 0 "Introduction t o Calligraphy"
A separate macro, .HU, has been developed for unnumbered headings, although
its effect is the same.
.HU "Introduction t o Calligraphy"
Even though an unnumbered heading does not display a number, it increments the
r
counter for second-level headings. Thus, in the following example, the heading "Intro-
duction to Calligraphy" is unnumbered, but it has the same effect on the numbering
scheme as if it had been a second-level heading (1.1).
Introduction to Calligraphy
If you are going to intermix numbered and unnumbered headings, you can change
the number register Hu to the lowest-level heading that i s in the document. By chang-
ing H u from 2 to a higher number:
.nr Hu 5
.H 1 "Quick Tour of Alcuin"
.HU "Introduction t o Calligraphy"
.H 2 "Digest of Alcuin Commands"
164 0 UNlX Text Processing 0
rT
the numbering sequence is preserved for the numbered heading following an unnum-
bered heading:
The basic issue in designing a heading style is to help the reader distinguish between
different levels of headings. For instance, in an outline form, different levels of indent
show whether a topic is a section or subsection. Using numbered headings is an effec-
tive way to accomplish this. If you use unnumbered headings, you probably want to
vary the heading style for each level, although, for practical purposes, you should limit
yourself to two or three levels.
First, let’s look at what happens if we use the default heading style.
The first two levels of headings are set up to produce italicized text in t r o f f
and underlined text in n r o f f . After the heading, there is a blank line before the first
paragraph of text. In addition, a top-level heading has two blank lines before the head-
ing; all the other levels have a single line of space.
7
Alcuin revitalizes an age-old tradition. Calligraphy, quite simply, is the art of
beautiful handwriting.
Levels three through seven all have the same appearance. The text is italicized or
underlined and no line break occurs. Two blank lines are maintained before and after
the text of the heading. For example:
0 ThemmMacros 0 165
I 1
1.2.1.3 Light Pen The copyist’s pen and ink has been replaced by a light pen.
To change the normal appearance of headings in a document, you specify new
values for the two strings:
HF Heading font
HP Heading point size
You can specify individual settings for each level, up to seven values.
The font for each level of heading can be set by the string H F . The following
codes are used to select a font:
1 Roman
2 Italic
3 Bold
By default, the arguments for all seven levels are set to 2, resulting in italicized head-
ings in t ro f f and underlining in nrof f. Here the .HF string specifies bold for
the top three levels followed by two italic levels:
.ds HF 3 3 3 2 2
If you do not specify a level, it defaults to 1. Thus, in the previous example, level 6
and 7 headings would be printed in a roman font.
The point size is set by the string HP. Normally, headings are printed in the
same size as the body copy, except for bold headings. A bold heading is reduced by 1
point when it is a standalone heading, as are the top-level headings. The HP string can
take up to seven arguments, setting the point size for each level.
.ds HP 1 4 1 4 12
If an argument is not given, or a null value or 0 is given, the default setting of 10 points
is used for that level. Point size can also be given relative to the current point size:
.ds HP + 4 + 4 + 2
Ej Eject page
Hb Break follows heading
Hc Center headings
H i Align text after heading
Hs Vertical spacing after heading
For each of these number registers, you specify the number of the level at which some
action is to be turned on or off.
166 0 UNlX Text Processing 0
The E j register is set to the highest-level heading, usually 1 , that should start on
a new page. Its default setting is 0. This ensures that the major sections of a document
will begin on their own page.
.nr Ej 1
The Hb register determines if a line break occurs after the heading. The Hs register
determines if a blank line is output after the heading. Both are set to 2 by default. Set-
tings of 2 mean that, for levels 1 and 2, the section heading is printed, followed by a
line break and a blank line separating the heading from the first paragraph of text. For
lower-level headings (an argument greater than 2), the first paragraph follows irnmedi-
ately on the same line.
The H c register is set to the highest-level heading that you want centered. Nor-
mally, this is not used with numbered headings and its default value is 0. However,
unnumbered heads are often centered. A setting of 2 will center first- and second-level
headings:
.nr Hc 2
With unnumbered headings, you also have to keep in mind that the value of H c must be
greater than or equal to Hb and Hu. The heading must be on a line by itself; therefore
a break must be set in Hb for that level. The Hu register sets the level of an unnum-
bered heading to 2, requiring that Hc be at least 2 to have an effect on unnumbered
headings.
There really is no way, using these registers, to get the first and second levels left
justified and have the rest of the headings centered.
The number register H i determines the paragraph type for a heading that causes a
line break (Hb). It can be set to one of three values:
0 Left justified
1 Paragraph type determined by P t
2 Indented to align with first character in heading
1 Arabic
001 Arabic with leading zeros
A Uppercase alphabetic
a Lowercase alphabetic
I Uppercase roman
i Lowercase roman
Marks can be mixed for an outline style similar to the one we produced using the list
macros:
.HM I A 1 a i
When you use marks consisting of roman numerals or alphabetic characters, you might
not want the mark of the current level to be concatenated to the mark of the previous
level. Concatenation can be suppressed by setting the number register H t to 1 :
.HM I i
.nr Ht 1
Now, each heading in the list has only the mark representing that level:
i. Introduction to Calligraphy
Table of Contents
Getting a table of contents easily and automatically is almost reason enough to justify
all the energy, yours and the computer’s, that goes into text processing. You realize
that this is something that the computer was really meant to do.
When the table of contents page comes out of the printer, a writer attains a state
of happiness known only to a statistician who can give the computer a simple instruc-
tion to tabulate vast amounts of data and, in an instant, get a single piece of paper list-
ing the results.
The reason that producing a table of contents seems so easy is that most of the
work is performed in coding the document. That means entering codes to mark each
level of heading and all the figures, tables, exhibits, and equations. Processing a table
of contents is simply a matter of telling the formatter to collect the information that’s
already in the file.
There are only two simple codes to put in a file, one at the beginning and one at
the end, to generate a table of contents automatically.
At the beginning of the file, you have to set the number register C 1 to the level
of headings that you want collected for a table of contents. For example, setting C 1 to
2 saves first- and second-level headings.
Place the . T C macro at the end of the file. This macro actually does the pro-
cessing and formatting of the table of contents. The table of contents page is output at
the end of a document.
A sample table of contents page follows. The header “CONTENTS” is printed
at the top of the page. At the bottom of the page, lowercase roman numerals are used
as page numbers.
0 ThemMacros 0 169
r- CONTENTS
- 1 -
One blank line is output before each first-level heading. AI1 first-level headings are left
justified. Lower-level headings are indented so that they line up with the start of text
for the previous level.
If you have included various displays in your document, and used the macros
.FG, .TB, and .EX to specify captions and headings for the displays, this informa-
tion is collected and output when the .TC macro is invoked. A separate page is
printed for each accumulated list of figures, tables, and exhibits. For example:
LIST OF TABLES
If you want the lists of displays to be printed immediately following the table of con-
tents (no page breaks), you can set the number register C p to 1.
If you want to suppress the printing of individual lists, you can set the following
number registers to 0:
Lf If 0, no figures
Lt If 0, no tables
Lx If 0, no exhibits
170 0 UNlX Text Processing 0
In addition, there is a number register for equations that is set to 0 by default. If you
want equations marked by .E C to be listed, specify:
.nr Le 1
There are a set of strings, using the same names as the number registers, that define the
titles used for the top of the lists:
Lf LIST OF FIGURES
Lt LIST OF TABLES
Lx LIST OF EXHIBITS
Le LIST OF EQUATIONS
You can redefine a string using the . d s (define string) request. For instance, we can
redefine the title for figures as follows:
.ds Lf LIST OF ALCUIN DRAWINGS
Footnotes
A footnote is marked in the body of a document by the string \*F. It follows immedi-
ately after the text (no spaces).
in an article on desktop publishing.\*F
The string F supplies the number for the footnote. It is printed (using t r o f f) as a
superscript in the text and its value is incremented with each use.
The .F S macro indicates the start, and .F E the end, of the text for the footnote.
These macros surround the footnote text that will appear at the bottom of the page. The
.FS macro is put on the line immediately following the marker.
.FS
"Publish or Perish: Start-up grabs early page language lead,"
\fIComputerworld\fR, April 21, 1986, p - 1.
-FE
You can use labels instead of numbers to mark footnotes. The label must be specified
as a mark in the text and as an argument with .FS.
...in accord with t h e internal specs.[APS]
.FS [ A P S ]
"Alcuin Product Specification," March 1986
.F E
0 ThemmMacros 0 171
You can use both numbered and labeled footnotes in the same document. All the foot-
notes are collected and output at the bottom of each page underneath a short line rule.
If you are using t r o f f, the footnote text will be set in a type size 2 points less than
the body copy.
If you want to change the standard format of footnotes, you can specify the F D .
macro. It controls hyphenation, text adjustment, indentation, and justification of the
label.
Normally, the text of a footnote i s indented from the left margin and the mark or
label i s left justified in relation to the start of the text. It is possible that a long footnote
could run over to the next page. Hyphenation is turned off so that a word will not be
broken at a page break. These specifications can be changed by giving a value between
.
0 and 11 as the first argument with F D , as shown in Table 6-3.
Text Label
Argument Hyphenation Adjust Indent Justification
0 no Yes Yes left
1 Yes Yes Yes left
2 no no Yes left
3 Yes no Yes left
4 no Yes no left
5 Yes Yes no left
6 no no no left
7 Yes no no left
8 no Yes Yes right
9 Yes Yes Yes right
10 no no Yes right
11 Yes no Yes right
.
The second argument for F D , if 1, resets the footnote numbering counter to 1.
This can be invoked at the end of a section or paragraph to initiate a new numbering
sequence. If specified by itself, the first argument must be null:
.FD I'" 1
References
A reference differs from a footnote in that all references are collected and printed on a
single page at the end of the document. In addition, you can label a reference so that
you can refer to it later.
172 UNlX Text Processing 0
You can also give as a string label argument to .RS the name of a string that will be
assigned the current reference number. This string can be referenced later in the docu-
ment. For instance, if we had specified a string label in the previous example:
.RS A s
We could refer back to the first reference in another place:
The output itself is a readable file which you can interpret
with the aid o f t h e PostScript manual.\*(As
At the end of the document, a reference page is printed. The title printed on the
reference page is defined in the string Rp. You can replace “REFERENCES” with
another title simply by redefining this string with - ds.
REFERENCES
In a large document, you might want to print a list of references at the end of a chapter
or a long section. You can invoke the .RP macro anywhere in a document.
- RP
.H 1 “Detailed Outline of User Guide”
It will print the list of references on a separate page and reset the reference counter to 0.
A reset argument and a paging argument can be supplied to change these actions. The
reset argument i s the first value specified with the .RP macro. It i s normally 0, reset-
ting the reference counter to 1 so that each section is numbered independently. If refer-
ence numbering should be maintained in sequence for the entire document, specify a
value of 1 .
0 ThemmMacros 0 173
The paging argument is the second value specified. It controls whether or not a
page break occurs before and after the list. It is normally set to 0, putting the list on a
new page. Specifying a value of 3 suppresses the page break before and after the list;
the result is that the list of references is printed following the end of the section and the
next section begins immediately after the list. A value of 1 will suppress only the page
break that occurs after the list and a value of 2 will suppress only the page break that
occurs before the list.
If you want an effect opposite that of the default settings, specify:
.RE’ 1 3
The first argument of 1 saves the current reference number for use in the next section or
chapter. The second argument of 3 inhibits page breaks before and after the list of
references.
Extensions to mm
So far, we have covered most but not all of the features of the mm macro package.
We have not covered the Technical Memorandum macros, a set of specialized
macros for formatting technical memos and reports. L i e the ones in the m s macro
package, these macros were designed for internal use at AT&T’s Bell Laboratories,
reflecting a company-wide set of standards. Anyone outside of Bell Labs will want to
make some modifications to the macros before using them. The Technical Memoran-
dum macros are a good example of employing a limited set of user macros to produce a
standard format. Seeing how they work will be especially important to those who are
responsible for implementing documentation standards for a group of people, some of
whom understand the basics of formatting and some of whom do not.
Writing or rewriting macros i s only one part of the process of customizing mm.
The mm macros were designed as a comprehensive formatting system. As we’ve seen,
there are even macros to replace common primitive requests, like .sp. The develop-
ers of mm recommend, in fact, that you not use n r o f f or t r o f f requests unless
absolutely necessary, lest you interfere with the action of the macros.
Furthermore, as you will see if you print out the mm macros, the internal code of
mm is extraordinarily dense, and uses extremely un-mnemonic register names. This
makes it very difficult for all but the most experienced user to modify the basic struc-
ture of the package. You can always add your own macros, as long as they don’t con-
flict with existing macro and number register names, but you can’t easily go‘in and
change the basic macros that make up the mm package.
At the same time, the developers of mm have made it possible for the user to
make selective modifications-those which mm has allowed mechanisms for in
advance. There are two such mechanisms:
The mm package is very heavily parameterized. Almost every feature of the formatting
system-from the fonts in which different levels of heading are printed to the size of
indents and the amount of space above and below displays-is controlled by values in
number registers. By learning and modifying these number registers, you can make sig-
nificant changes to the overall appearance of your documents.
In addition, there are a number of values stored in strings. These strings are used
like number registers to supply default values to various macros.
The registers you are most likely to want to change follow. Registers marked
with a dagger can only be changed on the comand line with the -r option (e.g.,
-rN4).
There are also some values that you would expect to be kept in number registers
that are actually kept in strings:
For example, placing the foIlowing register settings at the start of your document:
.nr Hc 1
.nr Hs 3
.nr Hb 4
.nr Hi 2
.ds HF 3 3 3 3 2 2 2
.ds HP 1 6 1 4 1 2 1 0 1 0 10 1 0
There isn?t space in this book for a comprehensive discussion of this topic. However, a
complete list of user-settable mm number registers is given in Appendix B. Study this
list, along with the discussion of the relevant macros, and you will begin to get a picture
of just how many facets of mm you can modify by changing the values in number
registers and strings.
The second feature-the provision of so-called ?user exit macros? at various
points-is almost as ingenious. The following macros are available for user definition:
.HX .HY .HZ .PX .TX .TY
The .HX, .HY, and .HZ macros are associated with headings. The .HX macro is
executed at the start of each heading macro, . H Y in the middle (to allow you to
respecify any settings, such as temporary indents, that were lost because of nun?s own
processing), and .HZ at the end.
By default, these macros are undefined. And, when t r o f f encounters an unde-
fined macro name, it simply ignores it. These macros thus lie hidden in the code until
you define them. By defining these macros, you can supplement the processing of
headings without actually modifying the mm code. Before you define these macros, be
sure to study the nun documentation for details of how to use them.
Similarly, .P X is executed at the top of each page, just after .PH. Accordingly,
it allows you to perform additional top-of-page processing. (In addition, you can rede-
fine the - TP macro, which prints the standard header, because this macro is relatively
self-contained.)
There is a slightly different mechanism for generalized bottom-of-page processing.
The .B S / . B E macro pair can be used to enclose text that will be printed at the bot-
tom of each page, after any footnotes but before the footer. To remove this text after
you have defined it, simply specify an empty block.
The .VM (verticd margins) macro allows you to specify additional space at the
top of the page, bottom of the page, or both. For example:
.VM 3 3
will add three lines each to the top and bottom margins. The arguments to this macro
should be unscaled. The first argument applies to the top margin, the second to the bot-
tom.
The .T X and .T Y macros allow you to control the appearance of the table of
contents pages. The .TX macro i s executed at the top of the first page of the table of
contents, above the title; .TY is executed in place of the standard title (?CON-
TENTS?).
In Chapter 14, you will learn about writing macro definitions, which should give
you the information you need to write these supplementary ?user exit macros.?
I
A P E
C H T R
.
Advanced Editing
The e x Editor
The e x editor is a line editor with its own complete set of editing commands.
Although it is simpler to make most edits with v i , the line orientation of e x is an
advantage when you are making large-scale changes to more than one part of a file.
With e x , you can move easily between files and transfer text from one file to another
in a variety of ways. You can search and replace text on a line-by-line basis, or glo-
bally. You can also save a series of editing commands as a macro and access them with
a single keystroke.
Seeing how e x works when it is invoked directly will help take some of the
“mystery” out of line editors and make it more apparent to you how many e x com-
mands work.
Let’s open a file and try a few e x commands. After you invoke e x on a file,
you will see a message about the total number of lines in the file, and a colon command
prompt. For example:
$ ex intro
“intro” 20 lines, 731 characters
You won’t see any lines in the file, unless you give an e x command that causes one or
more lines to be printed.
All e x commands consist of a line address, which can simply be a line number,
and a command. You complete the command with a carriage return. A line number by
itself is equivalent to a print command for that line. So, for example, if you type the
numeral 1 at the prompt, you will see the first line of the file:
:1
Sometimes, t o advance,
To print more than one line, you can specify a range of lines. Two line numbers are
specified, separated by commas, with no spaces in between them:
:1,3
Sometimes, t o advance,
y o u have t o g o backward.
Alcuin i s a computer graphics tool
The current line is the last line affected by a command. For instance, before we issued
the command 1 , 3, line 1 was the current line; after that command, line 3 became the
current line. It can be represented by a special symbol, a dot (.).
: .,+3
that lets you design and create hand-lettered, illuminated
manuscripts, such as were created in t h e Middle Ages.
The previous command results in three more lines being printed, starting with the
current line. A + or - specifies a positive or negative offset from the current line.
0 Advanced Editing 0 179
The e x editor has a command mode and an insert mode. To put text in a file,
you can enter the append or a command to place text on the line following the
current line. The i n s e r t or i command places text on the line above the current
line. Type in your text and when you are finished, enter a dot ( - ) on a line by itself:
:a
Monks, skilled i n calligraphy,
labored t o make copies of ancient
documents and preserve in a
library the works of many Greek and
Roman authors.
Entering the dot takes you out of insert mode and puts you back in command mode.
A line editor does not have a cursor, and you cannot move along a line of text to
a particular word. Apart from not seeing more of your file, the lack of a cursor (and
therefore cursor motion keys) is probably the most difficult thing to get used to. After
using a line editor, you long to get back to using the cw command in v i .
If you want to change a word, you have to move to the line that contains the
word, tell the editor which word on the line you want to change, and then provide its
replacement. You have to think this way to use the s u b s t i t u t e or s command.
It allows you to substitute one word for another.
We can change the last word on the first line from tool to environment:
:1
Alcuin is a computer graphics tool
:s/tool/environment/
Alcuin is a computer graphics environment
The word you want to change and its replacement are separated by slashes (/). As a
result of the substitute command, the line you changed is printed.
With a line editor, the commands that you enter affect the current line. Thus, we
made sure that the first line was our current line. We could also make the same change
by specifying the line number with the command:
:ls/environment/tool/
Alcuin is a computer graphics tool
If you specify an address, such as a range of line numbers, then the command will
affect the lines that you specify:
:1,20s/Alcuin/ALCUIN/
ALCUIN is named after an English scholar
Another reason that knowing ex is useful is that sometimes when you are work-
ing in vi, you might unexpectedly find yourself using “open mode.” For instance, if
you press Q while in vi, you will be dropped into the ex editor. You can switch to
vi by entering the command vi at the colon prompt:
:vi
After you are in vi, you can execute any ex command by first typing a :
(colon). The colon appears on the bottom of the screen and what you type will be
echoed there. Enter an ex command and press RETURN to execute it.
= Using ex Commands in v i
Many ex commands that perform normal editing operations have equivalent vi com-
mands that do the job in a simpler manner. Obviously, you will use d w or dd to
delete a single word or line rather than using the delete command in ex. How-
ever, when you want to make changes that affect numerous lines, you will find that the
ex commands are very useful. They allow you to modify large blocks of text with a
single command.
Some of these commands and their abbreviations follow. You can use the full
command name or the abbreviation. whichever is easier to remember.
The substitute command best exemplifies the ex editor’s ability to make editing easier.
It gives you the ability to change any string of text every place it occurs in the file. To
perform edits on a global replacement basis requires a good deal of confidence in, as
well as full knowledge of, the use of pattern matching or “regular expressions.”
Although somewhat arcane, learning to do global replacements can be one of the most
rewarding experiences of working in the UNIX text-processing environment.
Other e x commands give you additional editing capabilities. For all practical
purposes, they can be seen as an integrated part of vi. Examples of these capabilities
are the commands for editing multiple files and executing UNIX commands. We will
look at these after we look at pattern-matching and global replacements.
The way to make these changes is with the search and replace commands in ex.
You can automatically replace a word (or string of characters) wherever it occurs in the
file. You have already seen one example of this use of the substitute command, when
we replaced Alcuin with ALCUIN:
:1,20s/Alcuin/ALCUIN/
There are really two steps in using a search and replace command. The first step is to
define the area in which a search will take place. The search can be specified locally to
cover a block of text or globally to cover the entire file. The second step is to specify,
using the substitute command, the text that will be removed and the text that will
replace it.
At first, the syntax for specifying a search and replace command may strike you
as difficult to learn, especially when we introduce pattern matching. Try to keep in
mind that this is a very powerful tool, one that can save you a lot of drudgery. Besides,
you will congratulate yourself when you succeed, and everyone else will think you are
very clever.
- Current line
$ Last line
9.
0 All lines (same as 1 , $)
The following are examples that define the block of text that the substitute command
will act upon:
Within the search area, as defined in these examples, the substitute command will look
for one string of text and replace it with another string.
182 0 UNlX Text Processing 0
You can also use pattern matching to specify a place in the text. A pattern -is del-
imited by a slash both before and after it.
lpatternIl,lpattern2 / s Search from the first line containing pattern1 through the
first line containing pattern2
:.,lpattern / s Search from the current line through the line containing
pattern
It is important to note that the action takes place on the entire line containing the pat-
tern, not simply the text up to the pattern.
Combined with a line address, this command searches all the lines within the block of
text. But it only replaces the first occurrence of the pattern on each line. For instance,
if we specified a substitute command replacing roman with Roman in the following
line:
after t h e roman hand. In teaching t h e roman script
only the first, not the second, occurrence of the word would be changed.
To specify each occurrence on the line, you have to add a g at the end of the
command:
:s/roman/Roman/g
This command changes every occurrence of roman to Roman on the current line.
Using search and replace is much faster than finding each instance of a string and
replacing it individually. It has many applications, especially if you are a poor speller.
So far, we have replaced one word with another word. Usually, it’s not that easy.
A word may have a prefix or suffix that throws things off. In a while, we will look at
pattern matching. This will really expand what you are able to do. But first, we want
to look at how to specify that a search and replace take place globally in a file.
Confirming Substitutions
It is understandable if you are over-careful when using a search and replace command.
It does happen that what you get is not what you expected. You can undo any search
and replacement command by entering u. But you don’t always catch undesired
changes until it is too late to undo them. Another way to protect your edited file is to
save the file with :w before performing a replacement. Then, at least you can quit the
file without saving your edits and go back to where you were before the change was
made. You can also use :e ! to read in the previous version of the buffer.
0 Advanced Editing 183
It will display the entire line where the string has been located and the string itself will
be marked by a series of carets (AAA).
copyists at his school
.-.An
If you want to make the replacement, you must enter y and press RETURN.
If you don’t want to make a change, simply press RETURN.
this can b e used for invitations, signs, and menus.
h h h
The combination of the v i commands // (repeat last search) and . (repeat last com-
mand) is also an extraordinarily useful (and quick) way to page through a file and make
repetitive changes that require a judgment call rather than an absolute global replace-
ment.
This command searches all lines and replaces each occurrence on a line.
There is another way to do this, which is slightly more complex but has other
benefits. The pattern is specified as part of the address, preceded by a g indicating that
the search is global:
:g/Alcuin/s//ALCUIN/g
It selects all lines containing the pattern Alcuin and replaces every occurrence of that
pattern with ALCUIN. Because the search pattern is the same as the word you want to
change, you don’t have to repeat it in the substitute command.
The extra benefit that this gives is the ability to search for a pattern and then
make a different substitution. We call this context-sensitive replacement.
The gist of this command is globally search for a pattern:
:g /pattern/
Replace it:
:g/pattern/ s / /
:g/pattern/ s /string/
For example, we use the macro .BX to draw a box around the name of a special key.
To show an ESCAPE key in a manual, we enter:
.BX E s c
Suppose we had to change Esc to ESC, but we didn’t want to change any references to
Escape in the text. W e could use the following command to make the change:
:g/BX/s/Esc/ESC/
This command might be phrased: “Globally search for each instance of B X and on
those lines substitute the Esc with ESC”. We didn’t specify g at the end of the
command because we would not expect more than one occurrence per line.
Actually, after you get used to this syntax, and admit that it is a little awkward,
you may begin to like it.
Pattern Matching
If you are familiar with grep, then you know something about regular expressions. In
making global replacements, you can search not just for fixed strings of characters, but
also for patterns of words, referred to as regular expressions.
When you specify a literal string of characters, the search might turn up other
occurrences that you didn’t want to match. The problem with searching for words in a
file is that a word can be used in many different ways. Regular expressions help you
conduct a search for words in context.
Regular expressions are made up by combining normal characters with a number
of special characters. The special characters and their use follow.*
*\( and\), and \{n,rn\] are not supported in all versions of v i . \<, \>, \u,\u,U, a n d k are supported
only in v i / e x , and not in other programs using regular expressions.
0 AdvancedEditing 0 185
Unless you are already familiar with UNIX’s wildcard characters, this list of special
characters probably looks complex. A few examples should make things clearer. In the
examples that follow, a square (0)is used to mark a blank space.
186 0 UNlX Text Processing 0
Let’s follow how you might use some special characters in a replacement. Sup-
pose you have a long file and you want to substitute the word balls for the word ball
throughout that file. You first save the edited buffer with :w, then try the global
replacement:
:g/ball/s//balls/g
When you continue editing, you notice occurrences of words such as ballsoon, glo-
ballsy, and ballss. Returning to the last saved buffer with :e ! , you now try specifying
a space after bull to limit the search:
:g/ba110/s//ballsO/g
But this command misses the occurrences ball., ball,,ball:, and so on.
:g/\<ball\>/s//balls/g
By surrounding the search pattern with \ < and \>, we specify that the pattern should
only match entire words, with or without a subsequent punctuation mark. Thus, it does
not match the word balls if it already exists.
Because the \ < and \ > are only available in ex (and thus v i ) , you may have
occasions to use a longer form:
:g/ball\ ( [a, .; : !7 1 \ ) /s//balls\l/g
This searches for and replaces ball followed by either a space (indicated by n) or any
one of the punctuation characters ,.;:! ?. Additionally, the character that is matched
is saved using \ ( and \ ) and restored on the right-hand side with \ 1. The syntax
may seem complicated, but this command sequence can save you a lot of work in a
similar replacement situation.
mgibox routine
mgrbox routine
mgabox routine
If you want to save the prefixes, but want to change the name box to square, either of
the following replacement commands will do the trick:
:g/mg([iar])box/s//mg\lsquare/
The global replacement keeps track of whether an i, a,or r is saved, so that only
box is changed to square. This has the same effect as the previous command:
:g/mg[iar]box/s/box/square/g
The result is:
mgisquare routine
mgrsquare routine
mgasquare routine