0% found this document useful (0 votes)
895 views680 pages

Unix Text Processing

Uploaded by

api-3701136
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
895 views680 pages

Unix Text Processing

Uploaded by

api-3701136
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 680

UNIX"

TEXT PROCESSING
HOWARD u!SAMs &COMPANY
HAYDEN BOOKS

Related Titles
Advanced C Primer++ UNIX@ System V Bible
Stephen Prata, The Waite Group Stephen Prata and Donald Martin,
The Waite Group
Discovering MS-DOS@
Kate O’Day, The Waite Group UNIX@ Communications
Bryan Costales, The Waite Group
Microsoft? C Programming
for the IBM@ C with Excellence:
Robert Lajore, The Waite Group Programming Proverbs
Henry F. Ledgard with John Tauer
MS-DOS@ Bible
Steven Simrin, The Waite Group C Programmer’s Guide
to Serial Communications
MS-DOS@ Developer’s Guide Joe Campbell
John Angermeyer and Kevin Jaeger,
Hayden Books
The Waite Group

Tricks of the MS-DOS@ Masters UNIX System Library


John Angermeyer, Rich Fahringer,
Kevin Jaeger, and Dan Shajer, The Waite Group UNIX@ Shell Programming
Stephen G. Kochan and Patrick H. Wood
Inside XENIX@
Christopher L. Morgan, The Waite Group UNIX@ System Security
Patrick H. Wood and Stephen G. Kochan
UNIX@ Primer Plus
Mitchell Waite. Donald Martin, UNIX@ System Administration
and Stephen Prata, The Waite Group David Fieldler and Bruce H. Hunter

UNIX@ System V Primer, Exploring the UNIX@ System


Revised Edition Stephen G. Kochan and Patrick H. Wood
Mitchell Waite, Donald Martin,
and Stephen Prata, The Waite Group Programming in C
Stephen G. Kochan
Advanced U N I P -
A Programmer’s Guide Topics in C Programming
Stephen Prata. The Waite Group Stephen G. Kochan and Patrick H. Wood

UNIX@ Shell Programming Language


Rod Manis and Marc Meyer

For the retailer nearest you, or io order directly from the publisher,
call 800428-SAMs. In Indiana, Alaska, and Hawaii call 31 7-298-5699.
TEXT IPROCESSINC

DALE DOUGHERTY AND TIM O'REILLV


and the staff of O'Reilly & Associates, Inc.

CONSULTING EDITORS:

Stephen G. Kochan and Patrick H. Wood

HAYDEN BOOKS
A Division of Howard W Sams G Company
4300 West 62nd Street
Indianapolis, Indiana 46268 USA
Copyright 0 1987 O’Reilly & Associates, Inc.
FIRST EDITION
SECOND PRINTING - 1988
All rights reserved. No part of this book shall be reproduced, stored in a retrieval system, or
transmitted by any means, electronic, mechanical, photocopying, recording, or otherwise, without
written permission from the publisher. No patent liability is assumed with respect to the use of
the information contained herein. While every precaution has been taken in the preparation of
this book, the publisher assumes no responsibility for errors or omissions. Neither is any liability
assumed for damages resulting from the use of the information contained herein.
International Standard Book Number: 0-672-4629 1-5
Library of Congress Catalog Card Number: 87-60537
Acquisitions Editor: Therese Zak
Editor: Susan Pink Bussiere
Cover: Visual Graphic Services, Indianapolis
Design by Jerry Bates
Illustration by Patrick Sarles
Typesetting: O’Reilly & Associates, Inc.
Printed in the United States of America

Trademark Acknowledgements
All terms mentioned in this book that are known to be trademarks or service marks are listed
below. Howard W. Sams & Co. cannot attest to the accuracy of this information. Use of a term
in this book should not be regarded as affecting the validity of any trademark or service mark.
Apple is a registered trademark and Apple Laserwriter is a trademark of Apple Computer, Inc.
devps is a trademark of Pipeline Associates, Inc.
Merge/286 and Merge/386 are trademarks of Locus Computing Corp.
DDL is a trademark of Imagen Corp.
Helvetica and Times Roman are registered trademarks of Allied Corp.
IBM is a registered trademark of International Business Machines Corp.
Interpress is a trademark of Xerox Corp.
LaserJet is a trademark of Hewlett-Packard Corp.
Laserwriter is a trademark of Apple Computer, Inc.
Linotronic is a trademark of Allied Corp.
Macintosh is a trademark licensed to Apple Computer, Inc.
Microsoft is a registered trademark of Microsoft Corp.
MKS Toolkit is a trademark of Mortice Kern Systems, Inc.
Multimate is a trademark of Multimate International Corp.
Nutshell Handbook is a trademark of O’Reilly & Associates, Inc.
PC-Interface is a trademark of Locus Computing Corp.
PostScript is a trademark of Adobe Systems, Incorporated.
PageMaker is a registered trademark of Aldus Corporation.
SoftQuad Publishing Software and SQtroff are trademarks of SoftQuad Inc.
WordStar is a registered trademark of MicroPro International Corp.
UNIX is a registered trademark of AT&T.
VP/ix is a trademark of Interactive Systems Corp. and Phoenix Technologies, Ltd.
C 0 N T E N T S

Preface xi

1 From Typewriters to Word Processors 1

A Workspace . 2
Tools for Editing . 4
Document Formatting . . 6
Printing - 8
Other UNIX Text-Processing Tools . . 10

2 UNIX Fundamentals 12

The UNIX Shell . . 12


Output Redirection . 14
Special Characters . 19
Environment Variables . . 20
Pipes and Filters . . 21
Shell Scripts . 23

3 Learningvi 24

Session 1: Basic Commands . . 25


Opening a File . . 25
Moving the Cursor . 28
Simple Edits . 32
Session 2: Moving around in a Hurry . 41
Movement by Screens . . 42
Movement by Text Blocks . 44
Movement by Searches . 45
Movement by Line Numbers . . 47
Session 3: Beyond the Basics . 48
Command-Line Options . 49
Customizing vi . . 50
Edits and Movement . 53
More Ways to Insert Text . 54
Using Buffers . 54
Marking Your Place . 57
Other Advanced Edits . . 57

4 nroff and troff 58

What the Formatter Does . 59


Using n r o f f . 63
Using t r o f f . 64
The Markup Language . . 67
Turning Filling On and Off . 69
Controlling Justification . 71
Hyphenation . 73
Page Layout . 75
Page Transitions . . 86
Changing Fonts . . 92
A First Look at Macros . 99

5 The ms Macros 104

Formatting a Text File with m8 . 105


Page Layout . 106
Paragraphs . 106
Changing Font and Point Size . 114
Displays . . 117
Headings . . 120
Cover Sheet Macros . 122
Miscellaneous Features . . 123
Page Headers and Footers . 126
Problems on the First Page . 127
Extensions to ms . 127

6 ThemmMacros 128

Formatting a Text File . . 128


Page Layout . 132
Justification . 137
Word Hyphenation . 137
Displays . . 138
Changing Font and Point Size . 141
More About Displays . . 145
Forcing a Page Break . . 150
Formatting Lists . . 150
Headings . . 162
Table o f Contents . 168
Footnotes and References . 170
Extensions to nmr . 173

7 Advanced Editing 177

The ex Editor . 178


Using ex Commands in v i . 180
Write Locally, Edit Globally . 180
Pattern Matching . 184
Writing and Quitting Files . 190
Reading In a File . 192
Executing UNIX Commands . 192
Editing Multiple Files . . 195
Word Abbreviation . 198
Saving Commands with map . 198

8 Formatting with t b l 203

Using t b l .204
Specifying Tables . 205
A Simple Table Example . 206
Laying Out a Table . 207
Describing
- Column Formats . 209
Changing the Format within a Table . 219
Putting Text Blocks in a Column . 221
Breaking Up Long Tables . 224
Putting Titles on Tables . 225
A tbl Checklist . 226
Some Complex Tables . . 227

9 Typesetting Equations with eqn 232

A Simple eqn Example . 233


Using eqn . 233
Specifying Equations . . 234
Spaces in Equations . 236
Using Braces for Grouping . 238
Special Character Names . 239
Special Symbols . . 241
Other Positional Notation * 244
Diacritical Marks . 245
Defining Terms . . 247
Quoted Text . 248
Fine-Tuning the Document . 248
Keywords and Precedence . 250
Problem Checklist . 251

10 Drawing Pictures 253

The pic Preprocessor . . 254


From Describing to Programming Drawings . 281
pic Enhancements . 291
11 A Miscellany of UNIX Commands 293

Managing Your Files . . 293


Viewing the Contents of a File . 298
Searching for Information in a File . . 301
Proofing Documents . 304
Comparing Versions of the Same Document . 312
Manipulating Data . 322
Cleaning Up and Backing Up . . 336
Compressing Files . 338
Communications . 339
Scripts of UNIX Sessions . 341

12 Let the Computer Do the Dirty Work 342

Shell Programming . 343


ex Scripts . 354
Stream Editing (sed) . . 360
A Proofreading Tool You Can Build . 380

13 The a w k Programming Language 387

Invoking a w k . 388
Records and Fields . 389
Testing Fields , 390
Passing Parameters from a Shell Script . 390
Changing the Field Separator . . 391
System Variables . 392
Looping . . 393
a w k Applications . 400
Testing Programs . 410
14 Writing nrof f and t r o f f Macros 412

Comments . 412
Defining Macros . 413
Macro Names . 414
Macro Arguments . 416
Nested Macro Definitions . 418
Conditional Execution . . 418
Interrupted Lines . 423
Number Registers . 424
Defining Strings . . 429
Diversions . 431
Environment Switching . 433
Redefining Control and Escape Characters . 435
Debugging Your Macros . 436
Error Handling . . 439
Macro Style . 441

15 Figures and Special Effects 443

Formatter Escape Sequences . .443


Local Vertical Motions . * 445

Local Horizontal Motions . 447


Absolute Motions . 448
Line Drawing . a 9
Talking Directly to the Printer . 460
Marking a Vertical Position . . 461
Overstriking Words or Characters . 462
Tabs, Leaders, and Fields . 467
Constant Spacing . 471
Pseudo-Fonts . 473
Character Output Translations . 473
Output Line Numbering . 475
Change Bars . 476
Form Letters . 477
Reading in Other Files or Program Output . 479

16 What’s in a Macro Package? 48 1

Just What Is a Macro Package, Revisited . . 481


Building a Consistent Framework . 484
Page Transitions . . 489
Page Transitions in ms . . 491
Some Extensions to the Basic Package . 495
Other Exercises in Page Transition . .500
17 An Extended ms Macro Package 509

Creating a Custom Macro Package . . 510


Structured Technical Documents . 512
Figure and Table Headings . 523
Lists, Lists, and More Lists . 525
Source Code and Other Examples . 528
Notes, Cautions, and Warnings . 530
Table of Contents, Index. and Other End Lists . 532
18 Putting It All Together 542

Saving an External Table of Contents .544


Index Processing . 548
Let make Remember the Details . 562
Where to Go from Here . 567
A Editor Command Summary 569

B Formatter Command Summary 593

C Shell Command Summary 628

D Format of troff Width Tables 635

E Comparing rmn and ms 640

F The format Macros 643

G Selected Readings 646

Index 647
Preface

Many people think of computers primarily as “number crunchers,” and think of word
processors as generating form letters and boilerplate proposals. That computers can be
used productively by writers, not just research scientists, accountants, and secretaries, is
not so widely recognized. Today, writers not only work with words, they work with
computers and the software programs, printers, and terminals that are part of a computer
system.
The computer has not simply replaced a typewriter; it has become a system for
integrating many other technologies. As these technologies are made available at a rea-
sonable cost, writers may begin to find themselves in new roles as computer program-
mers, systems integrators, data base managers, graphic designers, typesetters, printers,
and archivists.
The writer functioning in these new roles is faced with additional responsibilities.
Obviously, it is one thing to have a tool available and another thing to use it skillfully.
Like a craftsman, the writer must develop a number of specialized skills, gaining con-
trol over the method of production as well as the product. The writer must look for
ways to improve the process by integrating new technologies and designing new tools
in software.
In this book, we want to show how computers can be used effectively in the
preparation of written documents, especially in the process of producing book-length
documents. Surely it is important to learn the tools of the trade, and we will demon-
strate the tools available in the UNIX environment. However, it is also valuable to
examine text processing in terms of problems and solutions: the problems faced by a
writer undertaking a large writing project and the solutions offered by using the
resources and power of a computer system.
In Chapter 1, we begin by outlining the general capabilities of word-processing
systems. We describe in brief the kinds of things that a computer must be able to do
for a writer, regardless of whether that writer is working on a UNIX system or on an
IBM PC with a word-processing package such as WordStar or MuItiMate. Then, hav-
ing defined basic word-processing capabilities, we look at how a text-processing system
includes and extends these capabilities and benefits. Last, we introduce the set of text-

= xi .
xii UNIX Text Processing 0

processing tools in the UNIX environment. These tools, used individually or in combi-
nation, provide the basic framework for a text-processing system, one that can be
custom-tailored to supply additional capabilities.
Chapter 2 gives a brief review of UNIX fundamentals. We assume you are
already somewhat acquainted with UNIX, but we included this information to make
sure that you are familiar with basic concepts that we will be relying on later in the
book.
Chapter 3 introduces the v i editor, a basic tool for entering and editing text.
Although many other editors and word-processing programs are available with UNIX,
v i has the advantage that it works, without modification, on almost every UNIX sys-
tem and with almost every type of terminal. If you learn v i , you can be confident that
your text editing skills will be completely transferable when you sit down at someone
else’s terminal or use someone else’s system.
Chapter 4 introduces the nrof f and t r o f f formatting programs. Because
v i is a text editor, not a word-processing program, it does only rudimentary formatting
of the text you enter. You can enter special formatting codes to specify how you want
the document to look, then format the text using either n r o f f or t r o f f . (The
n r o f f formatter is used for formatting documents to the screen or to typewriter-like
printers; t r o f f uses much the same formatting language, but has additional con-
structs that allow it to produce more elaborate effects on typesetters and laser printers.)
In this chapter, we also describe the different types of output devices for printing
your finished documents. With the wider availability of laser printers, you need to
become familiar with many typesetting terms and concepts to get the most out of
t r o f f ’ s capabilities.
The formatting markup language required by n r o f f and t r o f f is quite com-
plex, because it allows detailed control over the placement of every character on the
page, as well as a large number of programming constructs that you can use to define
custom formatting requests or macros. A number of macro packages have been
developed to make the markup language easier to use. These macro packages define
commonly used formatting requests for different types of documents, set up default
values for page layout, and so on.
Although someone working with the macro packages does not need to know
about the underlying requests in the formatting language used by n r o f f and t r o f f ,
we believe that the reader wants to go beyond the basics. As a result, Chapter 4 intro-
duces additional basic requests that the casual user might not need. However, your
understanding of what is going on should be considerably enhanced.
There are two principal macro packages in use today, m s and mm (named for the
command-line options to nro f f and t r o f f used to invoke them). Both macro
packages were available with most UNIX systems; now, however, m s is chiefly avail-
able on UNIX systems derived from Berkeley 4.x BSD, and mm is chiefly available on
UNIX systems derived from AT&T System V. If you are lucky enough to have both
macro packages on your system, you can choose which one you want to learn. Other-
wise, you should read either Chapter 5, The ms Macros, or Chapter 6, The m m Macros,
depending on which version you have available.
o Preface xiii

Chapter 7 returns to v i to consider its more advanced features. In addition, it


takes a look at how some of these features can support easy entry of formatting codes
used by n r o f f and t r o f f .
Tables and mathematical equations provide special formatting problems. The
low-level n r o f f and t r o f f commands for typesetting a complex table or equation
are extraordinarily complex. However, no one needs to learn or type these commands,
because two preprocessors, t b l and eqn, take a high-level specification of the table
or equation and do the dirty work for you. They produce a “script” of n r o f f or
t r o f f commands that can be piped to the formatter to lay out the table or equations.
The t b l and eqn preprocessors are described in Chapters 8 and 9, respectively.
More recent versions of UNIX (those that include AT&T’s separate Documenter’s
Workbench software) also support a preprocessor called p i c that makes it easier to
create simple line drawings with t r o f f and include them in your text. We talk about
pi c in Chapter 10.
Chapter 1 1 introduces a range of other UNIX text-processing tools-programs for
sorting, comparing, and in various ways examining the contents of text files. This
chapter includes a discussion of the standard UNIX s p e l l program and the Writer’s
Workbench programs s t y l e and d i c t i o n .
This concludes the first part of the book, which covers the tools that the writer
finds at hand in the UNIX environment. This material is not elementary. In places, it
grows quite complex. However, we believe there is a fundamental difference between
learning how to use an existing tool and developing skills that extend a tool’s capabili-
ties to achieve your own goals.
That is the real beauty of the UNIX environment. Nearly all the tools it provides
are extensible, either because they have built-in constructs for self-extension, like
n r o f f and t r o f f ’s macro capability, or because of the wonderful programming
powers of the UNIX command interpreter, the shell.
The second part of the book begins with Chapter 12, on editing scripts. There are
several editors in UNIX that allow you to write and save what essentially amount to
programs for manipulating text. The e x editor can be used from within v i to make
global changes or complex edits. The next step is to use e x on its own; and after you
do that, it is a small step to the even more powerful global editor sed. After you have
mastered these tools, you can build a library of special-purpose editing scripts that
vastly extend your power over the recalcitrant words you have put down on paper and
now wish to change.
Chapter 13 discusses another program-auk-that extends the concept of a text
editor even further than the programs discussed in Chapter 12. The auk program is
really a database programming language that is appropriate for performing certain kinds
of text-processing tasks. In particular, we use it in this book to process output from
t r o f f for indexing.
The next five chapters turn to the details of writing t r o f f macros, and show
how to customize the formatting language to simplify formatting tasks. We start in
Chapter 14 by looking at the basic requests used to build macros, then go on in Chapter
15 to the requests for achieving various types of special effects. In Chapters 16 and 17,
we’ll take a look at the basic structure of a macro package and focus on how to define
the appearance of large documents such as manuals. We’ll show you how to define
xiv 0 UNIX Text Processing 0

different styles of section headings, page headers, footers, and so on. We’ll also talk
about how to generate an automatic table of contents and index-two tasks that take
you beyond t r o f f into the world of shell programming and various UNIX text-
processing utilities.
To complete these tasks, we need to return to the UNIX shell in Chapter 18 and
examine in more detail the ways that it allows you to incorporate the many tools pro-
vided by UNIX into an integrated text-processing environment.
Numerous appendices summarize information that is spread throughout the text,
or that couldn’t be crammed into it.

***

Before we turn to the subject at hand, a few acknowledgements are in order. Though
only two names appear on the cover of this book, it is in fact the work of many hands.
In particular, Grace Todino wrote the chapters on t b l and e q n in their entirety, and
the chapters on v i and ex are based on the O’Reilly & Associates’ Nutshell Hand-
book, Learning the Vi Editor, written by Linda Lamb. Other members of the O’Reilly
& Associates staff-Linda Mui, Valerie Quercia, and Donna Woonteiler-helped tire-
lessly with copyediting, proofreading, illustrations, typesetting, and indexing.
Donna was new to our staff when she took on responsibility for the job of
copyfitting-that final stage in page layout made especially arduous by the many fig-
ures and examples in this book. She and Linda especially spent many long hours get-
ting this book ready for the printer. Linda had the special job of doing the final con-
sistency check on examples, making sure that copyediting changes or typesetting errors
had not compromized the accuracy of the examples.
Special thanks go to Steve Talbott of Masscomp, who first introduced us to the
power of t r o f f and who wrote the first version of the extended m s macros, f o r -
mat shell script, and indexing mechanism described in the second half of this book.
Steve’s help and patience were invaluable during the long road to mastery of the UNIX
text-processing environment.
We’d also like to thank Teri Zak, the acquisitions editor at Hayden Books, for her
vision of the Hayden UNIX series, and this book’s place in it.
In the course of this book’s development, Hayden was acquired by Howard Sams,
where Teri’s role was taken over by Jim Hill. Thanks also to the excellent production
editors at Sams, Wendy Ford, Lou Keglovitz, and especially Susan Pink Bussiere,
whose copyediting was outstanding.
Through it all, we have had the help of Steve Kochan and Pat Wood of Pipeline
Associates, Enc., consulting editors to the Hayden UNIX Series. We are grateful for
their thoughtful and thorough review of this book for technical accuracy. (We must, of
course, make the usual disclaimer: any errors that remain are our own.)
Steve and Pat also provided the macros to typeset the book. Our working drafts
were printed on an HP LaserJet printer, using d i t r o f f and TextWare International’s
t p l u s postprocessor. Final typeset output was prepared with Pipeline Associates’
devps, which was used to convert d i t r o f f output to PostScript, which was used in
turn to drive a Linotronic LlOO typesetter.
C H A P T E R

From Typewriters to Word Processors

Before we consider the special tools that the UNIX environment provides for text pro-
cessing, we need to think about the underlying changes in the process of writing that are
inevitable when you begin to use a computer.
The most important features of a computer program for writers are the ability to
remember what is typed and the ability to allow incremental changes-no more retyping
from scratch each time a draft is revised. For a writer first encountering word-
processing software, no other features even begin to compare. The crudest command
structure, the most elementary formatting capabilities, will be forgiven because of the
immense labor savings that take place.
Writing is basically an iterative process. It is a rare writer who dashes out a fin-
ished piece; most of us work in circles, returning again and again to the same piece of
prose, adding or deleting words, phrases, and sentences, changing the order of thoughts,
and elaborating a single sentence into pages of text.
A writer working on paper periodically needs to clear the deck-to type a clean
copy, free of elaboration. As the writer reads the new copy, the process of revision
continues, a word here, a sentence there, until the new draft is as obscured by changes
as the first. As Joyce Carol Oates i s said to have remarked: “No book is ever finished.
It is abandoned.”
Word processing first took hold in the office as a tool to help secretaries prepare
perfect letters, memos, and reports. As dedicated word processors were replaced with
low-cost personal computers, writers were quick to see the value of this new tool. In a
civilization obsessed with the written word, it is no accident that WordStar, a word-
processing program, was one of the first best sellers of the personal computer revolu-
tion.
As you learn to write with a word processor, your working style changes.
Because it is so easy to make revisions, it is much more forgivable to think with your
fingers when you write, rather than to carefully outline your thoughts beforehand and
polish each sentence as you create it.
If you do work from an outline, you can enter it first, then write your first draft by
filling in the outline, section by section. If you are writing a structured document such
2 0 UNlX Text Processing 0

as a technical manual, your outline points become the headings in your document; if
you are writing a free-flowing work, they can be subsumed gradually in the text as you
flesh them out. In either case, it i s easy to write in small segments that can be moved
as you reorganize your ideas.
Watching a writer at work on a word processor is very different from watching a
writer at work on a typewriter. A typewriter tends to enforce a linear flow-you must
write a passage and then go back later to revise it. On a word processor, revisions are
constant-you type a sentence, then go back to change the sentence above. Perhaps
you write a few words, change your mind, and back up to take a different tack; or you
decide the paragraph you just wrote would make more sense if you put it ahead of the
one you wrote before, and move it on the spot.
This is not to say that a written work is created on a word processor in a single
smooth flow; in fact, the writer using a word processor tends to create many more drafts
than a compatriot who still uses a pen or typewriter. Instead of three or four drafts, the
writer may produce ten or twenty. There is still a certain editorial distance that comes
only when you read a printed copy. This is especially true when that printed copy is
nicely formatted and letter perfect.
This brings us to the second major benefit of word-processing programs: they
help the writer with simple formatting of a document. For example, a word processor
may automatically insert carriage returns at the end of each line and adjust the space
between words so that all the lines are the same length. Even more importantly, the
text is automatically readjusted when you make changes. There are probably commands
for centering, underlining, and boldfacing text.
The rough formatting of a document can cover a multitude of sins. A s you read
through your scrawled markup of a preliminary typewritten draft, it is easy to lose track
of the overall flow of the document. Not so when you have a clean copy-the flaws of
organization and content stand out vividly against the crisp new sheets of paper.
However, the added capability to print a clean draft after each revision also puts
an added burden on the writer. Where once you had only to worry about content, you
may now find yourself fussing with consistency of margins, headings, boldface, italics,
and all the other formerly superfluous impedimenta that have now become integral to
your task.
As the writer gets increasingly involved in the formatting of a document, it
becomes essential that the tools help revise the document’s appearance as easily as its
content. Given these changes imposed by the evolution from typewriters to word pro-
cessors, let’s take a look at what a word-processing system needs to offer to the writer.

A Workspace
One of the most important capabilities of a word processor is that it provides a space in
which you can create documents. In one sense, the video display screen on your termi-
nal, which echoes the characters you type, is analogous to a sheet of paper. But the
workspace of a word processor i s not so unambiguous as a sheet of paper wound into a
typewriter, that may be added neatly to the stack of completed work when finished, or
tom out and crumpled as a false start. From the computer’s point of view, your
0 From Typewriters to Word Processors 0 3

workspace is a block of memory, called a hufSeer, that is allocated when you begin a
word-processing session. This buffer is a temporary holding area for storing your work
and is emptied at the end of each session.
To save your work, you have to write the contents of the buffer to a file. A file is
a permanent storage area on a disk (a hard disk or a floppy disk). After you have saved
your work in a file, you can retrieve it for use in another session.
When you begin a session editing a document that exists on file, a copy of the file
is made and its contents are read into the buffer. You actually work on the copy, mak-
ing changes to it, not the original. The file is not changed until you save your changes
during or at the end of your work session. You can also discard changes made to the
buffered copy, keeping the original file intact, or save multiple versions of a document
in separate files.
Particularly when working with larger documents, the management of disk files
can become a major effort. If, like most writers, you save multiple drafts, it is easy to
lose track of which version of a file is the latest.
An ideal text-processing environment for serious writers should provide tools for
saving and managing multiple drafts on disk, not just on paper. It should allow the
writer to

work on documents of any length;


save multiple versions of a file;
save part of the buffer into a file for later use;
switch easily between multiple files;
. insert the contents of an existing file into the buffer;
summarize the differences between two versions of a document.

Most word-processing programs for personal computers seem to work best for short
documents such as the letters and memos that offices chum out by the millions each
day. Although it is possible to create longer documents, many features that would help
organize a large document such as a book or manual are missing from these programs.
However, long before word processors became popular, programmers were using
another class of programs called text editors. Text editors were designed chiefly for
entering computer programs, not text. Furthermore, they were designed for use by com-
puter professionals, not computer novices. A s a result, a text editor can be more diffi-
cult to learn, lacking many on-screen formatting features available with most word pro-
cessors.
Nonetheless, the text editors used in program development environments can pro-
vide much better facilities for managing large writing projects than their office word-
processing counterparts. Large programs, like large documents, are often contained in
many separate files; furthermore, it is essential to track the differences between versions
of a program.
UNIX is a pre-eminent program development environment and, as such, it is also
a superb document development environment. Although its text editing tools at first
may appear limited in contrast to sophisticated office word processors, they are in fact
considerably more powerful.
4 0 UNIX Text Processing 0

Tools for Editing


For many, the ability to retrieve a document from a file and make multiple revisions
painlessly makes it impossible to write at a typewriter again. However, before you can
get the benefits of word processing, there is a lot to learn.
Editing operations are performed by issuing commands. Each word-processing
system has its own unique set of commands. A t a minimum, there are commands to

move to a particular position in the document;


insert new text;
change or replace text;
delete text;
copy or move text.

To make changes to a document, you must be able to move to that place in the text
where YOU want to make your edits. Most documents are too large to be displayed in
their entirety on a single terminal screen, which generally displays 24 lines of text.
Usually only a portion of a document is displayed. This partial view of your document
i s sometimes referred to as a window.* If you are entering new text and reach the bot-
tom line in the window, the text on the screen automatically scrolls (rolls up) to reveal
an additional line at the bottom. A cursor (an underline or block) marks your current
position in the window.
There are basically two kinds of movement:

scrolling new text into the window


positioning the cursor within the window

When you begin a session, the first line of text is the first line in the window, and the
cursor is positioned on the first character. Scrolling commands change which lines are
displayed in the window by moving forward or backward through the document.
Cursor-positioning commands allow you to move up and down to individual lines, and
along lines to particular characters.
After you position the cursor, you must issue a command to make the desired
edit. The command you choose indicates how much text will be affected: a character, a
word, a line, or a sentence.
Because the same keyboard is used to enter both text and commands, there must
be some way to distinguish between the two. Some word-processing programs assume
that you are entering text unIess you specify otherwise; newly entered text either

*Some editors, such as emacs, can split the terminal screen into multiple windows. In addition, many
high-powered UNIX workstations with large bit-mapped screens have their own windowing software that
allows multiple programs to be run simultaneously in separate windows. For purposes of this book, we
assume you are using the v i editor and an alphanumeric terminal with only a single window.
0 From Typewriters to Word Processors 0 5

replaces existing text or pushes it over to make room for the new text. Commands are
entered by pressing special keys on the keyboard, or by combining a standard key with
a special key, such as the control key (CTRL).
Other programs assume that you are issuing commands; you must enter a com-
mand before you can type any text at all. There are advantages and disadvantages to
each approach. Starting out in text mode is more intuitive to those coming from a type-
writer, but may be slower for experienced writers, because all commands must be
entered by special key combinations that are often hard to reach and slow down typing.
(We’ll return to this topic when we discuss v i , a UNIX text editor.)
Far more significant than the style of command entry is the range and speed of
commands. For example, though it is heaven for someone used to a typewriter to be
able to delete a word and type in a replacement, it is even better to be able to issue a
command that will replace every occurrence of that word in an entire document. And,
after you start making such global changes, it is essential to have some way to undo
them if you make a mistake.
A word processor that substitutes ease of learning for ease of use by having fewer
commands will ultimately fail the serious writer, because the investment of time spent
learning complex commands can easily be repaid when they simplify complex tasks.
And when you do issue a complex command, it is important that it works as
quickly as possible, so that you aren’t left waiting while the computer grinds away.
The extra seconds add up when you spend hours or days at the keyboard, and, once
having been given a taste of freedom from drudgery, writers want as much freedom as
they can get.
Text editors were developed before word processors (in the rapid evolution of
computers). Many of them were originally designed for printing terminals, rather than
for the CRT-based terminals used by word processors. These programs tend to have
commands that work with text on a line-by-line basis. These commands are often more
obscure than the equivalent office word-processing commands.
However, though the commands used by text editors are sometimes more difficult
to learn, they are usually very effective. (The commands designed for use with slow
paper terminals were often extraordinarily powerful, to make up for the limited capabili-
ties of the input and output device.)
There are two basic kinds of text editors, line editors and screen editors, and both
are available in UNIX. The difference is simple: line editors display one line at a time,
and screen editors can display approximately 24 lines or a full screen.
The line editors in UNIX include ed, sed, and ex. Although these line edi-
tors are obsolete for general-purpose use by writers, there are applications at which they
excel, as we will see in Chapters 7 and 12.
The most common screen editor in UNIX is v i . Learning v i or some other
suitable editor is the first step in mastering the UNIX text-processing environment.
Most of your time will be spent using the editor.
UNIX screen editors such as v i and emacs (another editor available on many
UNIX systems) lack ease-of-learning features common in many word processors-there
are no menus and only primitive on-line help screens, and the commands are often com-
plex and nonintuitive-but they are powerful and fast. What’s more, UNIX line editors
such as e x and sed give additional capabilities not found in word processors-the
6 0 UNIX Text Processing 0

ability to write a script of editing commands that can be applied to multiple files. Such
editing scripts open new ranges of capability to the writer.

Document Formatting
Text editing is wonderful, but the object of the writing process is to produce a printed
document for others to read. And a printed document is more than words on paper; it is
an arrangement of text on a page. For instance, the elements of a business letter are
arranged in a consistent format, which helps the person reading the letter identify those
elements. Reports and more complex documents, such as technical manuals or books,
require even greater attention to formatting. The format of a document conveys how
information is organized, assisting in the presentation of ideas to a reader.
Most word-processing programs have built-in formatting capabilities. Formatting
commands are intermixed with editing commands, so that you can shape your document
on the screen. Such formatting commands are simple extensions of those available to
someone working with a typewriter. For example, an automatic centering command
saves the trouble of manually counting characters to center a title or other text. There
may also be such features as automatic pagination and printing of headers or footers.
Text editors, by contrast, usually have few formatting capabilities. Because they
were designed for entering programs, their formatting capabilities tend to be oriented
toward the formats required by one or more programming languages.
Even programmers write reports, however. Especially at AT&T (where UNIX
was developed), there was a great emphasis on document preparation tools to help the
programmers and scientists of Bell Labs produce research reports, manuals, and other
documents associated with their development work.
Word processing, with its emphasis on easy-to-use programs with simple on-
screen formatting, was in its infancy. Computerized phototypesetting, on the other
hand, was already a developed art. Until quite recently, it was not possible to represent
on a video screen the variable type styles and sizes used in typeset documents. As a
result, phototypesetting has long used a markup system that indicates formatting instruc-
tions with special codes. These formatting instructions to the computerized typesetter
are often direct descendants of the instructions that were formerly given to a human
typesetter-center the next line, indent five spaces, boldface this heading.
The text formatter most commonly used with the UNIX system is called n r o f f .
To use it, you must intersperse formatting instructions (usually one- or two-letter codes
preceded by a period) within your text, then pass the file through the formatter. The
n r o f f program interprets the formatting codes and reformats the document “on the
fly” while passing it on to the printer. The n r o f f formatter prepares documents for
printing on line printers, dot-matrix printers, and letter-quality printers. Another pro-
gram called t r o f f uses an extended version of the same markup language used by
n r o f f , but prepares documents for printing on laser printers and typesetters. We’ll
talk more about printing in a moment.
Although formatting with a markup language may seem to be a far inferior system
to the “what you see is what you get” (wysiwyg) approach of most office word-
processing programs, it actually has many advantages.
0 From Typewriters to Word Processors 0 7

First, unless you are using a very sophisticated computer, with very sophisticated
software (what has come to be called an electronic publishing system, rather than a
mere word processor), it is not possible to display everything on the screen just as it
will appear on the printed page. For example, the screen may not be able to represent
boldfacing or underlining except with special formatting codes. Wordstar, one of the
grandfathers of word-processing programs for personal computers, represents underlin-
ing by surrounding the word or words to be underlined with the special control charac-
ter AS (the character generated by holding down the control key while typing the letter
S). For example, the following title line would be underlined when the document is
printed:
^Sword Processing with WordStar”S

Is this really superior to the following nrof f construct?


.u l
Text Processing with vi and nroff

It is perhaps unfair to pick on Wordstar, an older word-processing program, but very


few word-processing programs can complete the illusion that what you see on the
screen is what you will get on paper. There is usually some mix of control codes with
on-screen formatting. More to the point, though, is the fact that most word processors
are oriented toward the production of short documents. When you get beyond a letter,
memo, or report, you start to understand that there is more to formatting than meets the
eye.
Although “what you see is what you get” is fine for laying out a single page, it is
much harder to enforce consistency across a large document. The design of a large
document is often determined before writing is begun, just as a set of plans for a house
are drawn up before anyone starts construction. The design is a plan for organizing a
document, arranging various parts so that the same types of material are handled in the
same way.
The parts of a document might be chapters, sections, or subsections. For instance,
a technical manual is often organized into chapters and appendices. Within each
chapter, there might be numbered sections that are further divided into three or four lev-
els of subsections.
Document design seeks to accomplish across ?he entire document what is accom-
plished by the table of contents of a book. It presents the structure of a document and
helps the reader locate information.
Each of the parts must be clearly identified. The design specifies how they will
look, trying to achieve consistency throughout the document. The strategy might
specify that major section headings will be all uppercase, underlined, with three blank
lines above and two below, and secondary headings will be in uppercase and lowercase,
underlined, with two blank lines above and one below.
If you have ever tried to format a large document using a word processor, you
have probably found it difficult to enforce consistency in such formatting details as
these. By contrast, a markup language-especially one like nrof f that allows you to
define repeated command sequences, or macros-makes it easy: the style of a heading
is defined once, and a code used to reference it. For example, a top-level heading might
be specified by the code .H 1 , and a secondary heading by H2. I
8 0 UNlX Text Processing 0

Even more significantly, if you later decide to change the design, you simply
change the definition o f the relevant design elements. If you have used a word proces-
sor to format the document as it was written, it is usually a painful task to go back and
change the format.
Some word-processing programs, such as Microsoft WORD, include features for
defining global document formats, but these features are not as widespread as they are
in markup systems.

Printing
The formatting capabilities of a word-processing system are limited by what can be out-
put on a printer. For example, some printers cannot backspace and therefore cannot
underline. For this discussion, we are considering four different classes of printers: dot
matrix, letter quality, phototypesetter, and laser.
A dot-matrix printer composes characters as a series o f dots. It is usually suitable
for preparing interoffice memos and obtaining fast printouts o f large files.

TL,:
II,L>
- paraqraph was printed ~ l t ha dot-fiatrig p r i n t e r . It m e 5 a print
head cc.ntaining 9 pins, which are adjuster! t o produce the shape ci each
c h a r a c t x . More rophicated dot-aatrix p r i n t e r s h a r e p r i n t heads
contaising up t o 24 pins. The greater t h e nufiber o f pins, the finer
t h e d o t s ?hat a r e printed, and t h e mure psssible i t 1 5 tcr io01 the eye
into t h i n k i n g i t sees a solid character. Got atatrix prioters are ais0
c a p i h j ~r ~ f prifiting o u t graphic disp!ays.

A letter-quality printer is more expensive and slower. Its printing mechanism


operates like a typewriter and achieves a similar result.

This paragraph was printed with a letter-


quality printer. It is essentially a
computer-controlled typewriter and, like a
typewriter, uses a print ball or wheel
containing fully formed characters.

A letter-quality printer produces clearer, easier-to-read copy than a dot-matrix printer.


Letter-quality printers are generally used in offices for formal correspondence as well as
for the final drafts of proposals and reports.
Until very recently, documents that needed a higher quality o f printing than that
available with letter-quality printers were sent out for typesetting. Even if draft copy
was word-processed, the material was often re-entered by the typesetter, although many
typesetting companies can read the files created by popular word-processing programs
and use them as a starting point for typesetting.
0 From Typewriters to Word Processors 9

This paragraph, like the rest of this book, was phototypeset. In photo-
typesetting, a photographic technique is used to print characters on film or
photographic paper. There is a wide choice of type styles, and the charac-
ters are much more finely formed that those produced by a letter-quality
printer. Characters are produced by an arrangement of tiny dots, much like
a dot-matrix printer-but there are over 1000 dots per inch.

There are several major advantages to typesetting. The high resolution allows for the
design of aesthetically pleasing type. The shape of the characters is much finer. In
addition, where dot-matrix and letter-quality type is usually constant width (narrow
letters like i take up the same amount of space as wide ones like m), typesetters use
variable-width type, in which narrow letters take up less space than wide ones. In addi-
tion, it’s possible to mix styles (for example, bold and italic) and sizes of type on the
same page.
Most typesetting equipment uses a markup language rather than a wysiwyg
approach to specify point sizes, type styles, leading, and so on. Until recently, the tech-
nology didn’t even exist to represent on a screen the variable-width typefaces that
appear in published books and magazines.
AT&T, a company with its own extensive internal publishing operation,
developed its own typesetting markup language and typesetting program-a sister to
n r o f f called t r o f f (typesetter-rofJ). Although trof f extends the capabilities of
n r o f f in significant ways, it is almost totally compatible with it.
Until recently, unless you had access to a typesetter, you didn’t have much use for
t r o f f . The development of low-cost laser printers that can produce near typeset-
quality output at a fraction of the cost has changed all that.

This paragraph was produced on a laser printer. Laser printers produce


high-resolution characters-300 to 500 dots per inch-though they are not
quite as finely formed as phototypeset characters. Laser printers are not
only cheaper to purchase than phototypesetters, they also print on plain
paper, just like Xerox machines, and are therefore much cheaper to operate.
However, as i s always the case with computers, you need the proper
software to take advantage of improved hardware capabilities.

Word-processing software (particularly that developed for the Apple Macintosh, which
has a high-resolution graphics screen capable of representing variable type fonts) is
beginning to tap the capabilities of laser printers. However, most of the
microcomputer-based packages still have many limitations. Nonetheless, a markup
language such as that provided by t rof f still provides the easiest and lowest-cost
access to the world of electronic publishing for many types of documents.
The point made previously, that markup languages are preferable to wysiwyg sys-
tems for large documents, is especially true when you begin to use variable size fonts,
leading, and other advanced formatting features. I t is easy to lose track of the overall
format of your document and difficult to make overall changes after your formatted text
is in place. Only the most expensive electronic publishing systems (most of them based
on advanced UNIX workstations) give you both the capability to see what you will get
on the screen and the ability to define and easily change overall document formats.
10 0 UNIX Text Processing 0

Other UNIX Text-Processing Tools


Document editing and formatting are the most important parts of text processing, but
they are not the whole story. For instance, in writing many types of documents, such as
technical manuals, the writer rarely starts from scratch. Something is already written,
whether it be a first draft written by someone else, a product specification, or an out-
dated version of a manual. It would be useful to get a copy of that material to work
with. If that material was produced with a word processor or has been entered on
another system, UNIX’s communications facilities can transfer the file from the remote
system to your own.
Then you can use a number of custom-made programs to search through and
extract useful information. Word-processing programs often store text in files with dif-
ferent internal formats. UNIX provides a number of useful analysis and translation
tools that can help decipher files with nonstandard formats. Other tools allow you to
“cut and paste” portions of a document into the one you are writing.
As the document is being written, there are programs to check spelling, style, and
diction. The reports produced by those programs can help you see if there is any
detectable pattern in syntax or structure that might make a document more difficult for
the user than it needs to be.
Although many documents are written once and published or filed, there is also a
large class of documents (manuals in particular) that are revised again and again. Docu-
ments such as these require special tools for managing revisions. UNIX program
development tools such as SCCS (Source Code Control System) and dif f can be
used by writers to compare past versions with the current draft and print out reports of
the differences, or generate printed copies with change bars in the margin marking the
differences.
In addition to all of the individual tools it provides, UNIX is a particularly fertile
environment for writers who aren’t afraid of computers, because it is easy to write com-
mand files, or shell scripts, that combine individual programs into more complex tools
to meet your specific needs. For example, automatic index generation is a complex task
that is not handled by any of the standard UNIX text-processing tools. We will show
you ways to perform this and other tasks by applying the tools available in the UNIX
environment and a little ingenuity.
We have two different objectives in this book. The first objective is that you
learn to use many of the tools available on most UNIX systems. The second objective
is that you develop an understanding of how these different tools can work together in a
document preparation system. We’re not just presenting a UNIX user’s manual, but
suggesting applications for which the various programs can be used.
To take full advantage of the UNIX text-processing environment, you must do
more than just learn a few programs. For the writer, the job includes establishing stan-
dards and conventions about how documents will be stored, in what format they should
appear in print, and what kinds of programs are needed to help this process take place
efficiently with the use of a computer. Another way of looking at it is that you have to
make certain choices prior to beginning a project. We want to encourage you to make
your own choices, set your own standards, and realize the many possibilities that are
open to a diligent and creative person.
0 From Typewriters to Word Processors 0 11

In the past, many of the steps in creating a finished book were out of the hands of
the writer. Proofreaders and copyeditors went over the text for spelling and grammati-
cal errors. It was generally the printer who did the typesetting (a service usually paid
by the publisher). A t the print shop, a typesetter (a person) retyped the text and speci-
fied the font sizes and styles. A graphic artist, performing layout and pasteup, made
many of the decisions about the appearance of the printed page.
Although producing a high-quality book can still involve many people, UNIX
provides the tools that allow a writer to control the process from start to finish. An
analogy is the difference between an assembly worker on a production line who views
only one step in the process and a craftsman who guides the product from beginning to
end. The craftsman has his own system of putting together a product, whereas the
assembly worker has the system imposed upon him.
After you are acquainted with the basic tools available in UNIX and have spent
some time using them, you can design additional tools to perform work that you think
is necessary and helpful. To create these tools, you will write shell scripts that use the
resources of UNIX in special ways. We think there is a certain satisfaction that comes
with accomplishing such tasks by computer. It seems to us to reward careful thought.
What programming means to us is that when we confront a problem that normally
submits only to tedium or brute force, we think of a way to get the computer to solve
the problem. Doing this often means looking at the problem in a more general way and
solving it in a way that can be applied again and again.
One of the most important books on UNIX is The UNIX Programming Environ-
ment by Brian W. Kernighan and Rob Pike. They write that what makes UNIX effec-
tive ?is an approach to programming, a philosophy of using the computer.? A t the
heart of this philosophy ?is the idea that the power of a system comes more from the
relationships among programs than from the programs themselves.?
When we talk about building a document preparation system, it is this philosophy
that we are trying to apply. A s a consequence, this is a system that has great flexibility
and gives the builders a feeling of breaking new ground. The UNIX text-processing
environment is a system that can be tailored to the specific tasks you want to accom-
plish. In many instances, it can let you do just what a word processor does. In many
more instances, it lets you use more of the computer to do things that a word processor
either can?t do or can?t do very well.
C H A P T E R

UNIX Fundamentals

The UNIX operating system is a collection of programs that controls and organizes the
resources and activities of a computer system. These resources consist of hardware
such as the computer’s memory, various peripherals such as terminals, printers, and disk
drives, and software utilities that perform specific tasks on the computer system. UNIX
is a multiuser, multitasking operating system that allows the computer to perform a
variety of functions for many users. It also provides users with an environment in
which they can access the computer’s resources and utilities. This environment is
characterized by its command interpreter, the shell.
In this chapter, we review a set of basic concepts for users working in the UNIX
environment. As we mentioned in the preface, this book does not replace a general
introduction to UNIX. A complete overview is essential to anyone not familiar with the
file system, input and output redirection, pipes and filters, and many basic utilities. In
addition, there are different versions of UNIX, and not all commands are identical in
each version. In writing this book, we’ve used System V Release 2 on a Convergent
Technologies’ Miniframe.
These disclaimers aside, if it has been a while since you tackled a general intro-
duction, this chapter should help refresh your memory. If you are already familiar with
UNIX, you can skip or skim this chapter.
As we explain these basic concepts, using a tutorial approach, we demonstrate the
broad capabilities of UNIX as an applications environment for text-processing. What
you learn about UNIX in general can be applied to performing specific tasks related to
text-processing.

TheUNIXShell
As an interactive computer system, UNIX provides a command interpreter called a
shell. The shell accepts commands typed at your terminal, invokes a program to per-
form specific tasks on the computer, and handles the output or result of this program,
normally directing it to the terminal’s video display screen.

12
0 UNIX Fundamentals 0 13

UNIX commands can be simple one-word entries like the date command:
$ date
Tue A p r 8 13:23:41 EST 1 9 8 7

Or their usage can be more complex, requiring that you specify options and arguments,
such as filenames. Although some commands have a peculiar syntax, many UNIX
commands follow this general form:
command option(s) argument(s)
A command identifies a software program or utility. Commands are entered in
lowercase letters. One typical command, Is, lists the files that are available in your
immediate storage area, or directory.
An option modifies the way in which a command works. Usually options are
indicated by a minus sign followed by a single letter. For example, Is -1 modifies
what information is displayed about a file. The set of possible options is particular to
the command and generally only a few of them are regularly used. However, if you
want to modify a command to perform in a special manner, be sure to consult a UNIX
reference guide and examine the available options.
An argument can specify an expression or the name of a file on which the com-
mand is to act. Arguments may also be required when you specify certain options. In
addition, if more than one filename is being specified, special metacharacters (such as
* and ?) can be used to represent the filenames. For instance, Is -1 ch* will
display information about all files that have names beginning with c h .
The UNIX shell is itself a program that is invoked as part of the login process.
When you have properly identified yourself by logging in, the UNIX system prompt
appears on your terminal screen.
The prompt that appears on your screen may be different from the one shown in
the examples in this book. There are two widely used shells: the Bourne shell and the
C shell. Traditionally, the Bourne shell uses a dollar sign ($) as a system prompt, and
the C shell uses a percent sign (%). The two shells differ in the features they provide
and in the syntax of their programming constructs. However, they are fundamentally
very similar. In this book, we use the Bourne shell.
Your prompt may be different from either of these traditional prompts. This is
because the UNIX environment can be customized and the prompt may have been
changed by your system administrator. Whatever the prompt looks like, when it
appears, the system is ready for you to enter a command.
When you type a command from the keyboard, the characters are echoed on the
screen. The shell does not interpret the command until you press the RETURN key.
This means that you can use the erase character (usually the DEL or BACKSPACE key)
to correct typing mistakes. After you have entered a command line, the shell tries to
identify and locate the program specified on the command line. If the command line
that you entered is not valid, then an error message is returned.
When a program is invoked and processing begun, the output it produces is sent
to your screen, unless otherwise directed. To interrupt and cancel a program before it
has completed, you can press the interrupt character (usually CTRL-C or the DEL key).
If the output of a command scrolls by the screen too fast, you can suspend the output by
14 0 UNIX Text Processing 0

pressing the suspend character (usually CTRL-S) and resume it by pressing the resume
character (usually CTRL-0).
Some commands invoke utilities that offer their own environment-with a com-
mand interpreter and a set of special “internal” commands. A text editor is one such
utility, the mail facility another. In both instances, you enter commands while you are
“inside” the program. In these kinds of programs, you must use a command to exit
and return to the system prompt.
The return of the system prompt signals that a command is finished and that you
can enter another command. Familiarity with the power and flexibility of the UNIX
shell is essential to working productively in the UNIX environment.

Output Redirection
Some programs do their work in silence, but most produce some kind of result, or out-
put. There are generally two types of output: the expected result-referred to as staan-
durd output-and error messages-referred to as standard error. Both types of output
are normally sent to the screen and appear to be indistinguishable. However, they can
be manipulated separately-a feature we will later put to good use.
Let’s look at some examples. The echo command is a simple command that
displays a string of text on the screen.
$ echo my name
my name

In this case, the input echo m y name is processed and its output is m y name.
The name of the command-echo-refers to a program that interprets the command-
line arguments as a literal expression that is sent to standard output. Let’s replace
echo with a different command called c a t :
$ cat my name
cat: Cannot open m y
cat: C a n n o t o p e n name

The c a t program takes its arguments to be the names of files. If these files existed,
their contents would be displayed on the screen. Because the arguments were not
filenames in this example, an error message was printed instead.
The output from a command can be sent to a file instead of the screen by using
the output redirection operator (>). In the next example, we redirect the output of the
echo command to a file named r e m i n d e r s .
$ echo C a l l home a t 3:QO > reminders
$

No output is sent to the screen, and the UNIX prompt returns when the program is fin-
ished. Now the c a t command should work because we have created a file.
$ cat reminders
C a l l home at 3:OO

The c a t command displays the contents of the file named r e m i n d e r s on the


screen. If we redirect again to the same filename, we overwrite its previous contents:
0 UNIX Fundamentals 0 15

$ echo Pick up expense voucher > reminders


$ cat reminders
Pick up expense voucher
We can send another line to the file, but we have to use a different redirect operator to
append (>>) the new line at the end of the file:
$ echo Call home at 3:OO > reminders
$ echo Pick up expense voucher >> reminders
$ cat reminders
Call home at 3 : O O
Pick up expense voucher
The cat command is useful not only for printing a file on the screen, but for con-
catenating existing files (printing them one after the other). For example:
$ cat reminders todolist
Call home at 3 : O O
Pick up expense voucher
Proofread Chapter 2
Discuss output redirection
The combined output can also be redirected:
$ cat reminders todolist > do-now

The contents of both reminders and todolist are combined into do-now.
The original files remain intact.
If one of the files does not exist, an error message is printed, even though stan-
dard output is redirected:
$ rm todolist
$ cat reminders todolist > do now -
cat: todolist: not found
The files we’ve created are stored in our current working directory.

Files and Directories


The UNIX file system consists of files and directories. Because the file system can
contain thousands of files, directories perform the same function as file drawers in a
paper file system. They organize files into more manageable groupings. The file sys-
tem is hierarchical. It can be represented as an inverted tree structure with the root
directoiy at the top. The root directory contains other directories that in turn contain
other directories.*

*In addition to subdirectories, the root directory can contain otherfile systems. A file system is the skeletal
structure of a directory tree, which is built on a magnetic disk before any files or directories are stored on it.
On a system containing more than one disk, or on a disk divided into several partitions, there are multiple
file systems. However, this is generally invisible to the user, because the secondary file systems are
mounted on the root directory, creating the illusion of a single file system.
16 0 UNIX Text Processing 0

On many UNIX systems, users store their files in the /usr file system. (As disk
storage has become cheaper and larger, the placement of user directories is no longer
standard. For example, on our system, /usr contains only UNIX software: user
accounts are in a separate file system called /work.)
Fred’s home directory is /usr/fred. It is the location of Fred’s account on
the system. When he logs in, his home directory is his current working directory. Your
working directory is where you are currently located and changes as you move up and
down the file system.
A pathname specifies the location of a directory or file on the UNIX file system.
An absolute pathname specifies where a file or directory is located off the root file sys-
tem. A relative pathname specifies the location of a file or directory in relation to the
current working directory.
To find out the pathname of our current directory, enter pwd.
$ pwd
/usr/fred
The absolute pathname of the current working directory is /usr/fred. The Is
command lists the contents of the current directory. Let’s list the files and subdirec-
tories in /usr/ fred by entering the 1s command with the -F option. This option
prints a slash ( / ) following the names of subdirectories. In the following example,
oldstuff is a directory, and notes and reminders are files.
$ IS -F
reminders
notes
oldstuf f/
When you specify a filename with the 1s command, it simply prints the name of
the file, if the file exists. When you specify the name of directory, it prints the names
of the files and subdirectories in that directory.
$ 1s reminders
reminders
$ 1s oldstuff
chOl-draft
letter.212
memo
In this example, a relative pathname is used to specify oldstuf f. That is, its loca-
tion is specified in relation to the current directory, /usr/fred. You could also
enter an absolute pathname, as in the following example:
$ 1s /usr/fred/oldstuff
chOl-draft
letter.212
memo
Similarly, you can use an absolute or relative pathname to change directories using the
cd command. To move from /usr/fred to /usr/fred/oldstuff,you can
enter a relative pathname:
0 UNlX Fundamentals 0 17

$ cd oldstuff
$ pwd
/usr/fred/oldstuff

The directory / u s r / f red/oldstuff becomes the current working directory.


The cd command without an argument returns you to your home directory.
$ cd
When you log in, you are positioned in your home directory, which is thus your current
working directory. The name of your home directory is stored in a shell variable that is
accessible by prefacing the name of the variable (HOME) with a dollar sign ($). Thus:
$ echo $HOME
/usr/ f red

You could also use this variable in pathnames to specify a file or directory in your
home directory.
$ 1s $HOME/oldstuff/memo
/usr/fred/oldstuff/memo

In this tutorial, /usr/fred is our home directory.


The command to create a directory is mkdir. An absolute or relative pathname
can be specified.
$ mkdir /usr/fred/reports
$ mkdir reports/monthly

Setting up directories is a convenient method of organizing your work on the system.


For instance, in writing this book, we set up a directory /work/textp and, under
that, subdirectories for each chapter in the book (/work/textp/chOl,
/work/textp/ch02,etc.). In each of those subdirectories, there are files that
divide the chapter into sections (sectl, sect2,etc.). There is also a subdirectory
set up to hold old versions or drafts of these sections.

Copying and Moving Files


You can copy, move, and rename files within your current working directory or (by
specifying the full pathname) within other directories on the file system. The cp com-
mand makes a copy of a file and the mv command can be used to move a file to a new
directory or simply rename it. If you give the name of a new or existing file as the last
argument to cp or mv,the file named in the first argument is copied, and the copy
given the new name. (If the target file already exists, it will be overwritten by the copy.
If you give the name of a directory as the last argument to c p or mv, the file or files
named first will be copied to that directory, and will keep their original names.)
Look at t h e following sequence of commands:
$ pwd Prinr working directory
/usr/fred
~

18 0 UNIX Text Processing 0

$ IS -F List contents of current directory


meeting
oldstuf f/
notes
reports/
$ IUV notes oldstuff Move notes to oldstuff directory
$ 1s List contents of current directory
meeting
oldstuff
reports/
$ mv meeting meet.306 Rename meeting
$ 1s oldstuff List contents of oldstuff subdirectory
chOl-draft
letter.212
memo
notes
In this example, the m v command was used to rename the file meeting and to move
the file notes from /usr/fred to /usr/fred/oldstuff. You can also
use the mv command to rename a directory itself.

Permissions
Access to UNIX files is governed by ownership and permissions. If you create a file,
you are the owner of the file and can set the permissions for that file to give or deny
access to other users of the system. There are three different levels of permission:

r Read permission allows users to read a file or make a copy of it.


W Write permission allows users to make changes to that file.
X Execute permission signifies a program file and allows other users to
execute this program.

File permissions can be set for three different levels of ownership:

owner The user who created the file is its owner.


group A group to which you are assigned, usually made up of those users
engaged in similar activities and who need to share files among them-
selves.
other All other users on the system, the public.

Thus, you can set read, write, and execute permissions for the three levels of own-
ership. This can be represented as:

rwxrwxrwx
I I \
owner group other
0 UNIX Fundamentals 0 19

When you enter the command Is -1, information about the status of the file is
displayed on the screen. You can determine what the file permissions are, who the
owner of the file is, and with what group the file is associated.
$ 1s -1 meet.306
-rw-rw-r-- 1 fred techpubs 126 March 6 10:32 meet.306

This file has read and write permissions set for the user f r e d and the group
techpubs. All others can read the file, but they cannot modify it. Because f r e d is
the owner of the file, he can change the permissions, making it available to others or
denying them access to it. The chmod command is used to set permissions. For
instance, if he wanted to make the file writeable by everyone, he would enter:
$ chmod o+w meet.306
$ 1s -1 meet.306
-rw-rw-rw- 1 fred techpubs 126 March 6 10:32 meet.306

This translates to “add write permission (+w)to others (o).” If he wanted to remove
write permission from a file, keeping anyone but himself from accidentally modifying a
finished document, he might enter:
$ chmod go-w meet.306
$ 1s -1 meet.306
-rw-r--r-- 1 fred techpubs 126 March 6 10:32 meet.306

This command removes write permission (-w) from group (9) and other ( 0 ) .
File permissions are important in UNIX, especially when you start using a text
editor to create and modify files. They can be used to protect information you have on
the system.

Special Characters
As part of the shell environment, there are a few special characters (metacharacters) that
make working in UNIX much easier. We won’t review all the special characters, but
enough of them to make sure you see how useful they are.
The asterisk (*) and the question mark (?) are filename generation metacharac-
ters. The asterisk matches any or all characters in a string. By itself, the asterisk
expands to all the names in the specified directory.
$ echo *
meet.306 oldstuff reports

In this example, the echo command displays in a row the names of a11 the files and
directories in the current directory. The asterisk can also be used as a shorthand nota-
tion for specifying one or more files.
$1s meet*
meet. 306
$ 1s /work/textp/ch*
/work/textp/chOl
/work/textp/ch02
1
20 0 UNlX Text Processing 0

/work/textp/ch03
/work/textp/chapter -make

The question mark matches any single character.


$ 1s /work/textp/chOl/sect?
/work/textp/chOl/sectl
/work/textp/chOl/sect2
/work/textp/chOl/sect3

Besides filename metacharacters, there are other characters that have special meaning
when placed in a command line. The semicolon (;) separates multiple commands on
the same command line. Each command is executed in sequence from left to right, one
before the other.
$cd o1dstuff;pwd;ls
/usr/fred/oldstuff
chOl-draft
letter.212
memo
notes

Another special character is the ampersand (&). The ampersand signifies that a com-
mand should be processed in the background, meaning that the shell does not wait for
the program to finish before returning a system prompt. When a program takes a signi-
ficant amount of processing time, it is best to have it run in the background so that you
can do other work at your terminal in the meantime. We will demonstrate background
processing in Chapter 4 when we look at the nrof f /t rof f text formatter.

Environment Variables
The shell stores useful information about who you are and what you are doing in
environment variables. Entering the s e t command will display a list of the environ-
ment variables that are currently defined in your account.
$ set
PATH .:bin:/usr/bin:/usr/local/bin:/etc
argv 0
cwd /work/textp/ch03
home /usr/fred
shell /bin/sh
status 0
TERM wy50

These variables can be accessed from the command line by prefacing their name with a
dollar sign:
$ echo $TERM
wy50

The TERM variable identifies what type of terminal you are using. It is important that
you correctly define the TERM environment variable, especially because the v i text
0 UNIX Fundamentals 0 21

editor relies upon it. Shell variables can be reassigned from the command line. Some
variables, such as TERM, need to be exported if they are reassigned, so that they are
available to all shell processes.
$ TERM=tvi925; export TERM Tell UNIX I ' m using a Televideo 925
You can also define your own environment variables for use in commands.
$ friends="alice ed ralph"
$ echo $friends
alice ed ralph

You could use this variable when sending mail.


$ mail $friends
A message t o friends
<CTRL-D>

This command sends the mail message to three people whose names are defined in the
friends environment variable. Pathnames can also be assigned to environment vari-
ables, shortening the amount of typing:
$ pwd
/usr/fr e d
$ book="/work/textp"
$ cd $book
$ pwd
/work/textp

Pipes and Filters


Earlier we demonstrated how you can redirect the output of a command to a file. Nor-
mally, command input is taken from the keyboard and command output is displayed on
the terminal screen. A program can be thought of as processing a stream of input and
producing a stream of output. As we have seen, this stream can be redirected to a file.
In addition, it can originate from or be passed to another command.
A pipe is formed when the output of one command is sent as input to the next
command. For example:
$ 1s I wc

might produce:
10 10 72

The 1s command produces a list of filenames which is provided as input to w c . The


w c command counts the number of lines, words, and characters.
Any program that takes its input from another program, performs some operation
on that input, and writes the result to the standard output is referred to as afilter. Most
UNIX programs are designed to work as filters. This i s one reason why UNIX pro-
grams do not print "friendly" prompts or other extraneous information to the user.
22 0 UNIX Text Processing 0

Because all programs expect-and produce-nly a data stream, that data stream can
easily be processed by multiple programs in sequence.
One of the most common uses of filters is to process output from a command.
Usually, the processing modifies it by rearranging it or reducing the amount of informa-
tion it displays. For example:
$ who List who is on the system, and at which terminal
peter ttyOOl Mar 6 1 7 : 1 2
Walter tty003 Mar 6 13:51
Chris tty004 Mar 6 15:53
Val tty020 Mar 6 15:48
tim tty005 Mar 4 17:23
ruth tty006 Mar 6 17:02
fred ttyOOO Mar 6 10:34
dale tty008 Mar 6 15:26
$ who I sort List the same information in alphabetic order
Chris ttyOO4 Mar 6 15:53
dale ttyO08 Mar 6 15:26
fred ttyOOO Mar 6 10:34
peter ttyOOl Mar 6 17:12
ruth tty006 Mar 6 17:02
t im tty005 Mar 4 17:23
val tty020 Mar 6 15:48
Walter tty003 Mar 6 13:51
$

The s o r t program arranges lines of input in alphabetic or numeric order. It


sorts lines alphabetically by default. Another frequently used filter, especially in text-
processing environments, is g r e p , perhaps UNIX’s most renowned program. The
g r e p program selects lines containing a pattern:
$ who I grep ttyOOl Find out who is on terminal I
peter ttyOOl Mar 6 17:12

One of the beauties of UNIX is that almost any program can be used to filter the output
of any other. The pipe is the master key to building command sequences that go
beyond the capabilities provided by a single program and allow users to create custom
“programs” of their own to meet specific needs.
If a command line gets too long to fit on a single screen line, simply type a
backslash followed by a carriage return, or (if a pipe symbol comes at the appropriate
place) a pipe symbol followed by a carriage return. Instead of executing the command,
the shell will give you a secondary prompt (usually >) so you can continue the line:
$ echo This is a long line shown here as a demonstration I
> wc
1 10 49
This feature works in the Bourne shell only.
0 UNIX Fundamentals 0 23

Shell Scripts 1

A shell script is a file that contains a sequence of UNIX commands. Part of the flexi-
bility of UNIX is that anything you enter from the terminal can be put in a file and exe-
cuted. To give a simple example, we’ll assume that the last command example (grep)
has been stored in a file called whoison:
$ cat whoison
who I grep t t y O O l
The permissions on this file must be changed to make it executable. After a file
is made executable, its name can be entered as a command.
$ chmod + xwhoison
$ 1s -1 whoison
-rwxrwxr-x 1 fred doc 123 Mar 6 17:34 whois
$ whoison
peter ttyOOl Mar 6 17:12
Shell scripts can do more than simply function as a batch command facility. The basic
constructs of a programming language are available for use in a shell script, allowing
users to perform a variety of complicated tasks with relatively simple programs.
The simple shell script shown above is not very useful because it is too specific.
However, instead of specifying the name of a single terminal line in the file, we can
read the name as an argument on the command line. In a shell script, $ 1 represents
the first argument on the command line.
$ cat whoison
who I grep $1
Now we can find who is logged on to any terminal:
$ whoison t t y O O 4
Chris tty004 Mar 6 15:53
Later in this book, we will look at shell scripts in detail. They are an important part of
the writer’s toolbox, because they provide the “glue” for users of the UNIX system-
the mechanism by which all the other tools can be made to work together.
C H A P T E R
rn

Learning vi

UNIX has a number of editors that can process the contents of readable files, whether
those files contain data, source code, or text. There are line editors, such as ed and
ex, which display a line of the file on the screen, and there are screen editors, such as
v i and emacs, which display a part of the file on your terminal screen.
The most useful standard text editor on your system is vi. Unlike emacs, it is
available in nearly identical form on almost every UNIX system, thus providing a kind
of text editing linguafranca. The same might be said of ed and ex, but screen edi-
tors are generally much easier to use. With a screen editor you can scroll the page,
move the cursor, delete lines, insert characters, and more, while seeing the results of
your edits as you make them. Screen editors are very popular because they allow you
to make changes as you read a file, much as you would edit a printed copy, only faster.
To many beginners, v i looks unintuitive and cumbersome-instead of letting
you type normally and use special control keys for word-processing functions, it uses all
of the regular keyboard keys for issuing commands. You must be in a special insert
mode before you can type. In addition, there seem to be so many commands.
You can’t learn vi by memorizing every single vi command. Begin by learn-
ing some basic commands. As you do, be aware of the patterns of usage that com-
mands have in common. Be on the lookout for new ways to perform tasks, experiment-
ing with new commands and combinations of commands.
As you become more familiar with vi, you will find that you need fewer key-
strokes to tell v i what to do. You will learn shortcuts that transfer more and more of
the editing work to the computer-where it belongs. Not as much memorization is
required as first appears from a list of vi commands. Like any skill, the more editing
you do, the more you know about it and the more you can accomplish.
This chapter has three sections, and each one corresponds to a set of material
about vi that you should be able to tackle in a single session. After you have finished
each session, put aside the book for a while and do some experimenting. When you
feel comfortable with what you have learned, continue to the next session.

24
0 Learning vi 0 25

Session 1: Basic Commands


The first session contains the basic knowledge you need to operate the v i editor.
After a general description of vi, you are shown some simple operations. You will
learn how to

open and close a file;


give commands and insert text;
move the cursor;
edit text (change, delete, and copy).

You can use v i to edit any file that contains readable text, whether it is a report, a
series of shell commands, or a program. The v i editor copies the file to be edited into
a buffer (an area temporarily set aside in memory), displays as much of the buffer as
possible on the screen, and lets you add, delete, and move text. When you save your
edits, v i copies the buffer into a permanent file, overwriting the contents of the old
file.

Opening a File
The syntax for the v i command is:
vi filename]

wherefilename is the name of either an existing file or a new file. I f you don’t specify
a filename, v i will open an unnamed buffer, and ask you to name it before you can
save any edits you have made. Press RETURN to execute the command.
A filename must be unique inside its directory. On AT&T (System V) UNIX sys-
tems, it cannot exceed 14 characters. (Berkeley UNIX systems allow longer filenames.)
A filename can include any ASCII character except /, which is reserved as the separa-
tor between files and directories in a pathname. You can even include spaces in a
filename by “escaping” them with a backslash. In practice, though, filenames consist
of any combination o f uppercase and lowercase letters, numbers, and the characters .
(dot) and (underscore). Remember that UNIX is case-sensitive: lowercase filenames
are distinctfrom uppercase filenames, and, by convention, lowercase is preferred.
If you want to open a new file called notes in the current directory, enter:
$ vi notes
The v i command clears the screen and displays a new buffer for you to begin work.
Because notes is a new file, the screen displays a column of rzldes (-) to indicate
that there is no text in the file, not even blank lines.
26 0 UNlX Text Processing 0

If you specify the name of a file that already exists, its contents will be displayed on the
screen. For example:
$ vi l e t t e r

might bring a copy of the existing file 1e t t e r to the screen.

Mr. John Fust


Vice President, Research and Development
Gutenberg Galaxy Software
Waltham, Massachusetts 02154

Dear Mr. Fust:

In o u r conversation last Thursday, we discussed a


documentation project that would produce a user's manual
on the Alcuin product. Yesterday, I received the product
demo and other materials that you sent me.
-
-
-
-
"letter" 11 lines, 250 characters

The prompt line at the bottom of the screen echoes the name and size of the file.
0 Learning vi 0 27

Sometimes when you invoke vi, you may get either of the following messages:
[using open mode]

or:
Visual needs addressable cursor or upline capability

In both cases, there is a problem identifying the type of terminal you are using. You
can quit the editing session immediately by typing :q.
Although vi can run on almost any terminal, it must know what kind of terminal
you are using. The terminal type is usually set as part of the UNIX login sequence. If
you are not sure whether your terminal type is defined correctly, ask your system
administrator or an experienced user to help you set up your terminal. If you know
your terminal type (wy50 for instance), you can set your TERM environment variable
with the following command:
TERM=wy50; export TERM

vi Commands
The v i editor has two modes: command mode and insert mode. Unlike many word
processors, vi’s command mode is the initial or default mode. To insert lines of text,
you must give a command to enter insert mode and then type away.
Most commands consist of one or two characters. For example:
i insert
C change

Using letters as commands, you can edit a file quickly. You don’t have to
memorize banks of function keys or stretch your fingers to reach awkward combinations
of keys.
In general, vi commands

are case-sensitive (uppercase and lowercase keystrokes mean different things;


e.g., I is different from i);
are not echoed on the screen;
do not require a RETURN after the command.

There is also a special group of commands that echo on the bottom line of the
screen. Bottom-line commands are indicated by special symbols. The slash ( / ) and the
question mark (?) begin search commands, which are discussed in session 2. A colon
( :) indicates an ex command. You are introduced to one ex command (to quit a file
without saving edits) in this chapter, and the ex line editor is discussed in detail in
Chapter 7.
To tell vi that you want to begin insert mode, press i. Nothing appears on the
screen, but you can now type any text at the cursor. To tell v i to stop inserting text,
press ESC and you will return to command mode.
28 0 UNIX Text Processing 0

For example, suppose that you want to insert the word introduction. If you type
the keystrokes iintroduction, what appears on the screen is
introduction

Because you are starting out in command mode, v i interprets the first keystroke (i)as
the insert command. All keystrokes after that result in characters placed in the file,
until you press ESC. If you need to correct a mistake while in insert mode, backspace
and type over the error.
While you are inserting text, press RETURN to break the lines before the right
margin. An autowrap option provides a carriage return automatically after you exceed
the right margin. To move the right margin in ten spaces, for example, enter :set
wm=lO.
Sometimes you may not know i f you are in insert mode or command mode.
Whenever vi does not respond as you expect, press ESC. When you hear a beep, you
are in command mode.

Saving a File
You can quit working on a file at any time, save the edits, and return to the UNIX
prompt. The vi command to quit and save edits is ZZ. (Note that Z Z is capital-
ized.)
Let’s assume that you create a file called letter to practice vi commands
and that you type in 36 lines o f text. To save the file, first check that you are in com-
mand mode by pressing ESC, and then give the write and save command, ZZ. Your
file is saved as a regular file. The result is:
“letter” [New file] 36 lines, 1331 characters

You return to the UNIX prompt. I f you check the list of files in the directory, by typ-
ing Is at the prompt, the new file is listed:
$ Is
chOl ch02 letter
You now know enough to create a new file. As an exercise, create a file called
letter and insert the text shown in Figure 3-1. When you have finished, type Z Z to
save the file and return to the UNIX prompt.

. Moving the Cursor


Only a small percentage of time in an editing session may be spent adding new text in
insert mode. Much of the time, you will be editing existing text.
In command mode, you can position the cursor anywhere in the file. You start all
basic edits (changing, deleting, and copying text) by placing the cursor at the text that
you want to change. Thus, you want to be able to quickly move the cursor to that
place.
0 Learning v i 0 29

April 1, 1987

Mr. John Fust


Vice President, Research and Development
Gutenberg Galaxy Software
Waltham, Massachusetts 02159

Dear Mr. Fust:

In our conversation last Thursday, we discussed a


documentation project that would produce a user's
manual on the Alcuin product. Yesterday, I received
the product demo and other materials that you sent me.

Going through a demo session gave me a much better


understanding of the product. I confess to being
amazed by Alcuin. Some people around here, looking
over my shoulder, were also astounded by the
illustrated manuscript I produced with Alcuin. One
person, a student of calligraphy, was really impressed.

Today, I'll start putting together a written plan


that shows different strategies for documenting
the Alcuin product. After I submit this plan, and
you have had time to review it, let's arrange a
meeting at your company to discuss these strategies.

Thanks again for giving us the opportunity to bid on


this documentation project. I hope we can decide upon
a strategy and get started as soon as possible in order
to have the manual ready in time for the first customer
shipment. I look forward to meeting with you towards
the end of next week.

Sincerely,

Fred Caslon

Fig. 3-1. A sample letter entered with vi


30 0 UNlX Text Processing 0

There are v i commands to move

up, down, left, or right, one character at a time;


forward or backward by blocks of text such as words, sentences, or paragraphs;
forward or backward through a file, one screen at a time.

To move the cursor, make sure you are in command mode by pressing ESC. Give the
command for moving forward or backward in the file from the current cursor position.
When you have gone as far in one direction as possible, you’ll hear a beep and the cur-
sor stops. You cannot move the cursor past the tildes (-) at the end of the file.

Single Movements
The keys h, j, k, and 1,right under your fingertips, will move the cursor:
left one space
down one line
up one line
right one space

You could use the cursor arrow keys (t, &, +, t)or the RETURN and BACK-
SPACE keys, but they are out of the way and are not supported on all terminals.
You can also combine the h, j , k, and 1 keys with numeric arguments and
other v i commands.

Numeric Arguments
You can precede movement commands with numbers. The command 4 1 moves the
cursor (shown as a small box around a letter) four spaces to the right, just like typing
the letter 1 four times (1111).

move right
4 characters

This one concept (being able to multiply commands) gives you more options (and
power) for each command. Keep i t in mind as you are introduced to additional com-
mands.

Movement by Lines
When you saved the file letter, the editor displayed a message telling you how
many lines were in that file. A line in the file is not necessarily the same length as a
0 Learning vi 0 31

physical line (limited to 80 characters) that appears on the screen. A line is any text
entered between carriage returns. If you type 200 characters before pressing RETURN,
v i regards all 200 characters as a single line (even though those 200 characters look
like several physical lines on the screen).
Two useful commands in line movement are:
0 <zero> move to beginning of line
$ move to end of line

In the following file, the line numbers are shown. To get line numbers on your screen,
enter : s e t nu.
1 W i t h t h e s c r e e n e d i t o r you c a n s c r o l l t h e page,
2 move t h e c u r s o r , d e l e t e l i n e s , a n d i n s e r t c h a r a c t e r s ,
w h i l e s e e i n g t h e r e s u l t s o f e d i t s a s y o u make t h e m .
3 Screen editors a r e v e r y popular.

The number of logical lines (3) does not correspond to the number of physical lines (4)
that you see on the screen. If you enter $, with the cursor positioned on the d in the
word delete, the cursor would move to the period following the word them.
1 With t h e s c r e e n editor you can scroll t h e page,
2 move t h e c u r s o r , d e l e t e l i n e s , a n d i n s e r t c h a r a c t e r s ,
w h i l e s e e i n g t h e r e s u l t s o f e d i t s a s y o u make them,
3 S c r e e n e d i t o r s a r e very p o p u l a r .

If you enter 0 (zero), the cursor would move back to the letter t in the word the, at the
beginning of the line.
1 With t h e s c r e e n editor you can scroll t h e page,
2 move t h e c u r s o r , d e l e t e l i n e s , a n d i n s e r t c h a r a c t e r s ,
w h i l e s e e i n g t h e r e s u l t s o f e d i t s a s y o u make t h e m .
3 S c r e e n e d i t o r s a r e very p o p u l a r .

If you do not use the automatic wraparound option (: s e t wm=lO) in v i , you


must break lines with carriage returns to keep the lines of manageable length.

Movement by Text Blocks


You can also move the cursor by blocks of text (words, sentences, or paragraphs).
The command w moves the cursor forward one word at a time, treating symbols
and punctuation marks as equivalent to words. The following line shows cursor move-
ment caused by ten successive w commands:
move t h e c u r s o r , delete l i n e s , and i n s e r t characters,

You can also move forward one word at a time, ignoring symbols and punctuation
marks, using the command W (note the uppercase W). It causes the cursor to move to
the first character following a blank space. Cursor movement using W looks like this:
move t h e c u r s o r , delete l i n e s , and i n s e r t characters,
32 0 UNlX Text Processing 0

To move backward one word at a time, use the command b. The B command allows
you to move backward one word at a time, ignoring punctuation.
With either the w, W, b, or B commands, you can multiply the movement with
numbers. For example, 2w moves forward two words; 5 B moves back five words,
ignoring punctuation. Practice using the cursor movement commands, combining them
with numeric multipliers.

Simple Edits
When you enter text in your file, i t is rarely perfect. You find errors or want to
improve a phrase. After you enter text, you have to be able to change it.
What are the components of editing? You want to insert text (a forgotten word or
a missing sentence). And you want to delete text (a stray character or an entire para-
graph). You also need to change letters and words (correct misspellings or reflect a
change of mind). You want to move text from one place to another part of your file.
And on occasion, you want to copy text to duplicate it in another part of your file.
There are four basic edit commands: i for insert (which you have already seen),
c for change, d for delete, d then p for move (delete and put), and y for yank
(copy). Each type of edit is described in this section. Table 3-1 gives a few simple
examples.

TABLE 3-1. Basic Editing Commands

I Obiect Change Delete CODV(Yank)


One word cw dw YW
Two words 2cw 2d W 2YW
Three words back 3cb 3db 3Yb
One line cc dd YY o r Y
To end of line c$orC d$orD y$
To beginning of line c0 dO YO
Single character r X Yl

Inserting New Text


You have already used the insert command to enter text into a new file. You also use
the insert command while editing existing text to add characters, words, and sentences.
Suppose you have to insert Today, at the beginning of a sentence. Enter the follow-
ing sequence of commands and text:
0 Learningvi 0 33

1 1’11 s t a r t p u t t i n g
together a written
p l a n t h a t shows
d i f f e r e n t strategies
3k
moveup3
lines
I‘ll start p u t t i n g
together a written
p l a n t h a t shows

t
~

I’ll start putting


- Today, I‘ll start putting
together a written ‘ T o d a y , <ESC together a written
p l a n t h a t shows insert p l a n t h a t shows
different strategies Today, different strategies
I

In the previous example, v i moves existing text to the right as the new text is inserted.
That is because we are showing v i on an “intelligent” terminal, which can adjust the
screen with each character you type. An insert on a “dumb” terminal (such as an
adm3a) will look different. The terminal itself cannot update the screen for each char-
acter typed (without a tremendous sacrifice of speed), so v i doesn’t rewrite the screen
until after you press ESC. Rather, when you type, the dumb terminal appears to
overwrite the existing text. When you press ESC, the line i s adjusted immediately so
that the missing characters reappear. Thus, on a dumb terminal, the same insert would
appear as follows:

I‘ll start putting


- Today, a r t p u t t i n g
together a written iToday together a written
p l a n t h a t shows insert p l a n t h a t shows
different strategies Today, d i f f e r e n t strategies

Today,-I‘ll start putting


together a written <ESC> together a written
p l a n t h a t shows leave p l a n t h a t shows
different strategies insert mode different strategies
I
34 0 UNlX Text Processing 0

Changing Text
You can replace any text in your file with the change command, c. To identify the
amount of text that you want replaced, combine the change command with a movement
command. For example, c can be used to change text from the cursor
cw to the end of a word
2cb back two words
CS to the end of a line

Then you can replace the identified text with any amount of new text: no characters at
all, one word, or hundreds of lines. The c command leaves you in insert mode until
you press the ESC key.

Words
You can replace a word ( c w ) with a longer word, a shorter word, or any amount of text.
The c w command can be thought of as “delete the word marked and insert new text
until ESC is pressed.”
Suppose that you have the following lines in your file letter and want to
change designing to putting together. You only need to change one word.

I‘ll start I‘ll start


designing a cw -
designin$ a
change a
word

Note that the c w command places a $ at the last character of the word to be changed.

designin$ a putting putting together a


together
<ESC>
enter change

The c w command also works on a portion of a word. For example, to change


putting to puts, position the cursor on the second t, enter c w , then type s and press
ESC. By using numeric prefixes, you can change multiple words or characters immedi-
ately. For example:
3 cw change three words to the right of the cursor
5cl change five letters to the right of the cursor

You don’t need to replace the specified number of words, characters, or lines with a like
amount of text. For example:
-

0 Learning v i 35

I'll s t a r t I'll start


Eutting together a 2c w designing a
de s i g n i n g
<ESC>

Lines
To replace the entire current line, there is the special change command cc. This com-
mand changes an entire line, replacing that line with the text entered before an ESC.
The cc command replaces the entire line of text, regardless of where the cursor i s
located on the line.
The C command replaces characters from the current cursor position to the end
of the line. It has the same effect as combining c with the special end-of-line indica-
tor, $ (as in cS).

Characters
One other replacement edit is performed with the r command. This command replaces
a single character with another single character. One of its uses is to correct misspel-
lings. You probably don't want to use c w in such an instance, because you would
have to retype the entire word. Use r to replace a single character at the cursor:

Yasterday, I received Yesterday, I received


re
replace a
with e

The r command makes only a single character replacement. You do not have to press
ESC to finish the edit. Following an r command, you are automatically returned to
command mode.

Deleting Text
You can also delete any text in your file with the delete command, d. Like the change
command, the delete command requires an argument (the amount of text to be operated
on). You can delete by word (dw), by line (dd and D), or by other movement com-
mands that you will learn later.
With all deletions, you move to where you want the edit to take place and enter
the delete command (d) followed by the amount of text to be deleted (such as a text
object, w for word).
36 0 UNlX Text Processing 0

Words
Suppose that in the following text you want to delete one instance of the word srurt in
the first line.

Today, I'll s t a r t Today, 1'11-


s t a r t putting together dw s t a r t putting together
a written plan delete word a written plan
t h a t t h shows d i f f e r e n t t h a t t h shows d i f f e r e n t

The dw command deletes from the cursor's position to the end of a word. Thus, d w
can be used to delete a portion of a word.

that+h shows d i f f e r e n t
II I
thatshows d i f f e r e n t
1
I
dw
delete word

As you can see, d w deleted not only the remainder of the word, but also the space
before any subsequent word on the same line. To retain the space between words, use
de, which will delete only to the end of the word.

that+h shows d i f f e r e n t that-shows different

word end

You can also delete backwards (db) or to the end or beginning of a line (dS or do).

Lines
The dd command deletes the entire line that the cursor is on. Using the same text as
in the previous example, with the cursor positioned on the first line as shown, you can
delete the first two lines:
0 Learning v i 0 37

Today, 1 ’ 1 1 - a written plan


start putting together 2 dd t h a t shows d i f f e r e n t
a written plan delete first
t h a t shows d i f f e r e n t 2 lines

If you are using a dumb terminal or one working at less than 1200 baud, line deletions
look different. The dumb or slow terminal will not redraw the screen until you scroll
past the bottom of the screen. Instead the deletion appears as:

I,“ written plan


t h a t shows d i f f e r e n t

An @ symbol “holds the place” of the deleted line, until the terminal redraws the
entire screen. (You can force v i to redraw the screen immediately by pressing either
CTRL-L or CTRL-R, depending on the terminal you’re using.)
The D command deletes from the cursor position to the end of the line:

Today, I ‘ l l s t a r t Today, I ’ l l s t a r t
putting together a putting together a
written plan delete to written plan that
t h a t shows d i f f e r e n t end of line that-

You cannot use numeric prefixes with the D command.

Characters
Often, while editing a file, you want to delete a single character or two. Just as r
changes one character, x deletes a single character. The x command deletes any char-
acter the cursor is on. In the following line, you can delete the letter 1 by pressing x.

Today, 1’111 s t a r t Today, I’llstart


putting X putting
delete
character
38 0 UNlX Text Processing 0

The X command deletes the character before the cursor. Prefix either of these com-
mands with a number to delete that number of characters. For example, 5 X will delete
the five characters to the left of the cursor.

Moving Text
You can move text by deleting it and then placing that deleted text elsewhere in the file,
like a “cut and paste.” Each time you delete a text block, that deletion is temporarily
saved in a buffer. You can move to another position in the file and use the put com-
mand to place the text in a new position. Although you can move any block of text,
this command sequence is more useful with lines than with words.
The put command, p, places saved or deleted text (in the buffer) after the cursor
position. The uppercase version of the command, P, puts the text hefore the cursor. If
you delete one or more lines, p puts the deleted text on a new line(s) below the cursor.
If you delete a word, p puts the deleted text on the same line after the cursor.
Suppose that in your file 1e t t e r you have the following lines and you want to
move the fourth line of text. Using delete, you can make this edit. First delete the line
in question:

~ ~~

Today, I’ll s t a r t Today, I‘ll s t a r t


putting together a dd putting together a
p l a n for d o c u m e n t i n g delete line p l a n f o r documenting
Lhe A l c u i n p r o d u c t t h a t shows
-
t h a t shows

Then use p to restore the deleted line at the next line below the cursor:

Today, I‘ll s t a r t Today, I‘ll s t a r t


putting together a P putting together a
p l a n for d o c u m e n t i n g 1 restore deleted p l a n for d o c u m e n t i n g
t h a t shows line t h a t shows

You can also use xp (delete character and put after cursor) to transpose two letters.
For example, in the word mvoe, the letters Y O are transposed (reversed). To correct this,
place the cursor on v and press x then p.
After you delete the text, you must restore it before the next change or delete
command. If you make another edit that affects the buffer, your deleted text will be
lost. You can repeat the put command over and over, as long as you don’t make a new
edit. In the advanced v i chapter, you will learn how to retrieve text from named and
numbered buffers.
~

0 Learning vi 0 39

Copying Text
Often, you can save editing time (and keystrokes) by copying part of your file to
another place. You can copy any amount of existing text and place that copied text
elsewhere in the file with the two commands y (yank) and p (put). The yank com-
mand is used to get a copy of text into the buffer without altering the original text.
This copy can then be placed elsewhere in the file with the put command.
Yank can be combined with any movement command (for example, yw, y $ . or
4yy). Yank is most frequently used with a line (or more) of text, because to yank and
put a word generally takes longer than simply inserting the word. For example, to yank
five lines of text:

on the Alcuin product. on the Alcuin product.


Yesterday, I received 5YY Yesterday, I received
the product demo yank 5 the product demo
and other materials lines and other materials
that you sent me. that you sent me.
- -
- I

... 5 lines yanked

To place the yanked text, move the cursor to where you want to put the text, and
use the p command to insert it below the current line, or P to insert it above the
current line.

that you sent me. that you sent me.


- P on the Alcuin product.
I
place yanked Yesterday, I received
- text the product demo
- and other materials
I
that you sent me.

5 more lines

The yanked text will appear on the line below the cursor. Deleting uses the same buffer
as yanking. Delete and put can be used in much the same way as yank and put. Each
new deletion or yank replaces the previous contents of the yank buffer. As we’ll see
later, up to nine previous yanks or deletions can be recalled with put commands.
40 0 UNlX Text Processing 0

Using Your Last Command


Each command that you give is stored in a temporary buffer until you give the next
command. If you insert the after a word in your file, the command used to insert the
text, along with the text that you entered, is temporarily saved. Anytime you are mak-
ing the same editing command repeatedly, you can save time by duplicating the com-
mand with . (dot). To duplicate a command, position the cursor anywhere on the
screen, and press . to repeat your last command (such as an insertion or deletion) in
the buffer. You can also use numeric arguments (as in 2 .) to repeat the previous com-
mand more than once.
Suppose that you have the following lines in your. file letter. Place the cur-
sor on the line you want to delete:

Yesterday, I received Yesterday, I received


the product demo. dd the product demo.
-
Yesterday, I received delete line Qther materials
other materials

Yesterday, I received
t h e product demo. the product demo.
other materials repeat last
command (dd)

In some versions of v i , the command CTRL-@ ("e) repeats the last insert (or
append) command. This is in contrast to the command, which repeats the last com-
mand that changed the text, including delete or change commands.
You can also undo your last command if you make an error. To undo a com-
mand, the cursor can be anywhere on the screen. Simply press u to undo the last com-
mand (such as an insertion or deletion).
To continue the previous example:

Yesterday, I received
Lhe product demo. U the product demo.
undo last Qther materials
command

The uppercase version of u (U) undoes all edits on a single line, as long as the cursor
remains on that line. After you move off a line, you can no longer use U.
~

0 Learning v i 0 41

Joining Two Lines with J


Sometimes while editing a file, you will end up with a series of short lines that are dif-
ficult to read. When you want to merge two lines, position the cursor anywhere on the
first line and press J to join the two lines.

Yesterday.
- Yesterday,
- I received
I received J t h e product demo.
the product demo. join lines

A numeric argument joins that number of consecutive lines.

Quitting without Saving Edits


When you are first learning vi, especially if you are an intrepid experimenter, there is
one other command that is handy for getting out of any mess that you might create.
You already know how to save your edits with Z Z , but what if you want to wipe out
all the edits you have made in a session and return to the original file?
You can quit vi without saving edits with a special bottom-line command based
on the ex line editor. The ex commands are explained fully in the advanced vi
chapter, but for basic vi editing you should just memorize this command:
:q! <RE TURN>

The q ! command quits the file you are in. All edits made since the last time you
saved the file are lost.
You can get by in v i using only the commands you have learned in this session.
However, to harness the real power of v i (and increase your own productivity) you
will want to continue to the next session.

. Session 2: Moving Around in a Hurry .


You use v i not only to create new files but also to edit existing files. You rarely open
to the first line in the file and move through it line by line. You want to get to a
specific place in a file and start work.
All edits begin with moving the cursor to where the edit begins (or, with ex line
editor commands, identifying the line numbers to be edited). This chapter shows you
how to think about movement in a variety of ways (by screens, text, patterns, or line
numbers). There are many,ways to move in vi, because editing speed depends on get-
ting to your destination with only a few keystrokes.
42 UNlX Text Processing 0

In this session, you will learn how to move around in a file by

screens;
text blocks;
searches for patterns;
lines.

Movement by Screens
When you read a book you think of “places” in the book by page: the page where you
stopped reading or the page number in an index. Some v i files take up only a few
lines, and you can see the whole file at once. But many files have hundreds of lines.
You can think of a v i file as text on a long roll of paper. The screen is a win-
dow of (usually) 24 lines of text on that long roll. In insert mode, as you fill up the
screen with text, you will end up typing on the bottom line of the screen. When you
reach the end and press RETURN, the top line rolls out of sight, and a blank line for
new text appears on the bottom of the screen. This is called scrolling. You can move
through a file by scrolling the screen ahead or back to see any text in the file.

Scrdiing the Screen


There are v i commands to scroll forward and backward through the file by full and
half screens:
^F forward one screen
^B backward one screen
^D forward half screen
^U backward half screen

(The symbol represents the CTRL key.


A “F means to simultaneously press the
CTRL key and the F key.)

In our conversation last Thursday, we


discussed a documentation project that would
produce a user‘s manual on t h e Alcuin product.
Yesterday, I received t h e product demo and
other materials that you sent me.

Going through a demo session gave m e a


much better understanding of t h e product. I
confess t o being amazed by Alcuin. Some
0 Learning v i 0 43

If you press "F, the screen appears as follows:

better understanding of t h e product. I


confess t o being amazed by Alcuin. Some
people around here, looking over my shoulder,
were a l s o astounded by t h e illustrated
manuscript I produced with Alcuin. One
person, a student of calligraphy, was really
impressed.

I Today, I'll start putting together a written

There are also commands to scroll the screen up one line ("E) and down one line ("Y).
(These commands are not available on small systems, such as the PDP-11 or Xenix for
the PC-XT.)

Movement within a Screen


You can also keep your current screen or view of the file and move around within the
screen using:
H home-top line on screen
M middle line on screen
L last line on screen
nH to n lines below top line
nL to n lines above last line

The H command moves the cursor from anywhere on the screen to the first, or home,
line. The M command moves to the middle line, L to the last. To move to the line
below the first line, use 2 H .

Today, I ' l l s t a r t T o d a y , I'll s t a r t


putting together a Eutting together a
written plan that written plan that
shows t h e d i f f e r e n t second line shows t h e d i f f e r e n t
s t r a t e g i e s for t h e s t r a t e g i e s for t h e

These screen movement commands can also be used for editing. For example, dH
deletes to the top line shown on the screen.
44 0 UNlX Text Processing 0

Movement within Lines


Within the current screen there are also commands to move by line. You have already
learned the line movement commands $ and 0.
RETURN beginning of next line
A
to first character of current line
+ beginning of next line
- beginning of previous line

~~ ~~

Going through a demo Going through a demo


session gave m e a much - session gave me a much
better understanding go to start better understanding
of t h e product. of previous of t h e product.
line

The command moves to the first character of the line, ignoring any spaces or tabs.
A

( 0 , by contrast, moves to the first position of the line, even if that position is blank.)

. Movement by Text Blocks


Another way that you can think of moving through a v i file is by text blocks-words,
sentences, or paragraphs. You have already learned to move forward and backward by
word (w or b).
end of word
end of word (ignore punctuation)
beginning of previous sentence
beginning of next sentence
beginning of previous paragraph
beginning of next paragraph
The vi program locates the end of a sentence by finding a period followed by at
least two spaces, or a period as the last nonblank character on a line. I f you have left
only a single space following a period, the sentence won’t be recognized.
A paragraph is defined as text up to the next blank line, or up to one of the
-
default paragraph macros ( I P , .P , .P P , or .QP) in the mm or m s macro pack-
ages. The macros that are recognized as paragraph separators can be customized with
the :s e t command, as described in Chapter 7.

____~

In our conversation -
In our conversation
last Thursday, we . .. I last Thursday, w e ...
go to S t a n
Going through a demo of previous Going through a demo
paragraph session gave me --.
Learning v i 45

Most people find it easier to visualize moving ahead, so the forward commands
are generally more useful.
Remember that you can combine numbers with movement. For example, 3 )
moves ahead three sentences. Also remember that you can edit using movement com-
mands: d) deletes to the end of the current sentence, 2 y } copies (yanks) two para-
graphs ahead.

. Movement by Searches
One of the most useful ways to move around quickly in a large file is by searching for
text, or, more properly, for a pattern of characters. The pattern can include a “wild-
card” shorthand that lets you match more than one character. For example, you can
search for a misspelled word or each occurrence of a variable in a program.
The search command i s the slash character (/). When you enter a slash, it
appears on the bottom line of the screen; then type in the pattern (a word or other string
of characters) that you want to find:
/text<RETURN> search forward for text

A space before or after text will be included in the search. As with all bottom-line com-
mands, press RETURN to finish.
The search begins at the cursor and moves forward, wrapping around to the start
of the file if necessary. The cursor will move to the first occurrence of the pattern (or
the message “Pattern not found” will be shown on the status line if there is no match).
If you wanted to search for the pattern shows:

Today, I ’ l l start Today, I‘ll s t a r t


putting together a /shows<CR> p u t t i n g t o g e t h e r a
w r i t t e n Elan that search for w r i t t e n plan that
shows t h e d i f f e r e n t shows shows t h e d i f f e r e n t
-
- -
- -
I
/shows

Today, I ’ l l start Today, I’ll start


putting together a /th<CR> putting together a
written plan that search for written plan that
shows t h e d i f f e r e n t th shows t h e d i f f e r e n t
I I

.-, -
.-,
/th
46 0 UNlX Text Processing 0

The search proceeds forward from the present position in the file. You can give any
combination of characters; a search does not have to be for a complete word.
You can also search backwards using the ? command:
?text<RETURN> search backward for text

The last pattern that you searched for remains available throughout your editing
session. After a search, instead of repeating your original keystrokes, you can use a
command to search again for the last pattern.
n repeat search in same direction
N repeat search in opposite direction
/ <RET URN > repeat search in forward direction
? <RE TURN > repeat search in backward direction
Because the last pattern remains available, you can search for a pattern, do some
work, and then search again for the pattern without retyping by using n, N, /, or ?.
The direction of your search (/=forwards, ?=backwards) is displayed at the bottom left
of the screen.
Continuing the previous example, the pattern th is still available to search for:

Today, I'll s t a r t Today, I'll s t a r t


putting together a n putting together a
written plan t h a t search for written plan that
shows t h e d i f f e r e n t next rh shows t h e d i f f e r e n t
I

Today, I'll s t a r t Today, I'll s t a r t


putting together a ?<CR> putting together a
written plan that search back written plan that
shows t h e d i f f e r e n t for th shows t h e d i f f e r e n t
- -
I -
- ?the

Today, I'll s t a r t Today, I'll s t a r t


putting together a putting together a
written plan that repeat search written plan that
shows t h e d i f f e r e n t in opposite shows t h e d i f f e r e n t
direction
0 Learning vi 0 47

This section has given only the barest introduction to searching for patterns. Chapter 7
will teach more about pattern matching and its use in making global changes to a file.

Current Line Searches


There is also a miniature version of the search command that operates within the current
line. The command f moves the cursor to the next instance of the character you name.
Semicolons can then be used to repeat the “find.” Note, however, that the f com-
mand will not move the cursor to the next line.
fx find (move cursor to) next occurrence of x in the line, where x can be
any character
I repeat previous find command

Suppose that you are editing on this line:

I
Today, I‘ll s t a r t Today, Iill s t a r t
f‘
find first ’
in line

Use d f ‘ to delete up to and including the named character (in this instance ’). This

=
command is useful in deleting or copying partial lines.
The t command works just like f , except it positions the cursor just before the
character searched for. As with f and b. a numeric prefix will locate the nth
occurrence. For example:

Today, 1‘11 s t a r t Today, 1’11 s f a r t


2ta
place cursor
before 2nd a
in line

Movement by Line Numbers


A file contains sequentially numbered lines, and you can move through a file by speci-
fying line numbers. Line numbers are useful for identifying the beginning and end of
large blocks of text you want to edit. Line numbers are also useful for programmers
because compiler error messages refer to line numbers. Line numbers are also used by
e x commands, as you will learn in Chapter 7.
48 0 UNlX Text Processing 0

If you are going to move by line numbers, you need a way to identify line
numbers. Line numbers can be displayed on the screen using the :set nu option
described in Chapter 7. In v i , you can also display the current line number on the
bottom of the screen.
The command "G displays the following on the bottom of your screen: the
current line number, the total number of lines in the file, and what percentage of the
total the present line number represents. For example, for the file letter, "G might
display:
'.letter" line 10 of 4 0 - - 2 5 % - -
^G is used to display the line number to use in a command, or to orient yourself if you
have been distracted from your editing session.
The G (go to) command uses a line number as a numeric argument, and moves to
the first position on that line. For instance, 4 4 G moves the cursor to the beginning of
line 44. The G command without a line number moves the cursor to the last line of the
file.
Two single quotes ( ' ') return you to the beginning of the line you were origi-
nally on. Two backquotes (' ' ) return you to your original position exactly. If you
* .
have issued a search command ( / or ?), will return the cursor to its position when
you started the search.
The total number of lines shown with "G can be used to give yourself a rough
idea of how many lines to move. If you are on line 10 of a 1000-line file:
"chOl" line 1 0 of 1 0 0 0 --1%--

and know that you want to begin editing near the end of that file, you could give an
approximation of your destination with:
800G

Movement by line number can get you around quickly in a large file.

Session 3: Beyond the Basics


You have already been introduced to the basic v i editing commands, i, c, d, and
y. This session expands on what you already know about editing. You will learn

additional ways to enter v i ;


how to customize v i ;
how to combine all edits with movement commands;
additional ways to enter insert mode;
how to use buffers that store deletions, yanks, and your last command;
. how to mark your place in a file.
0 Learning vi 0 49

Command-Line Options
There are other options to the v i command that can be helpful. You can open a file
directly to a specific line number or pattern. You can also open a file in read-only
mode. Another option recovers all changes to a file that you were editing when the sys-
tem crashes.

Advancing to a Specific Place


When you begin editing an existing file, you can load the file and then move to the first
occurrence of a pattern or to a specific line number. You can also combine the open
command, v i , with your first movement by search or by line number. For example:
$ vi +n l e t t e r
opens l e t t e r at line number n. The following:
$ vi + letter

opens 1 e t t e r at the last line. And:


$ v i +/pattern l e t t e r

opens 1 e t t e r at the first occurrence of pattern.


To open the file l e t t e r and advance directly to the line containing Afcuin,
enter:
$ v i +/Alcuin l e t t e r

7
Today I'll s t a r t p u t t i n g t o g e t h e r a
written plan that presents the different
strategies for the Blcuin
-

There can be no spaces in the pattern because characters after a space are interpreted as
filenames.
If you have to leave an editing session before you are finished, you can mark your
place by inserting a pattern such as Z Z Z or HERE. Then when you return to the file,
all you have to remember is / Z Z Z or /HERE.
50 0 UNlX Text Processing 0

Read-only Mode
There will be times that you want to look at a file, but you want to protect that file from
inadvertent keystrokes and changes. (You might want to call in a lengthy file to prac-
tice v i movements, or you might want to scroll through a command file or program.)
If you enter a file in read-only mode, you can use all the v i movement commands, but
you cannot change the file with any edits. To look at your file l e t t e r in read-only
mode, you can enter either:
$ v i -R letter
or:
$ view l e t t e r

Recovering a Buffer
Occasionally, there will be a system failure while you are editing a file. Ordinarily, any
edits made after your last write (save) are lost. However, there is an option, -r, which
lets you recover the edited buffer at the time of a system crash. (A system program
called p r e s e r v e saves the buffer as the system is going down.)
When you first log in after the system is running again, you will receive a mail
message stating that your buffer is saved. The first time that you call in the file, use the
-r option to recover the edited buffer. For example, to recover the edited buffer of the
file l e t t e r after a system crash, enter:
$ vi -r l e t t e r
If you first call in the file without using the -r option, your buffered edits are lost.
You can force the system to preserve your buffer even when there i s not a crash
by using the command :pre. You may find this useful if you have made edits to a
file, then discover you can’t save your edits because you don’t have write permission.
(You could also just write a copy of the file out under another name or in a directory
where you do have write permission.)

Customizing vi
A number of options that you can set as part of your editing environment affect how
v i operates. For example, you can set a right margin that will cause v i to wrap lines
automatically, so you don’t need to insert carriage returns.
You can change options from within v i by using the :s e t command. In addi-
tion, v i reads an initialization file in your home directory called . e x r c for further
operating instructions. By placing s e t commands in this file, you can modify the
way v i acts whenever you use it.
You can also set up .e x r c files in local directories to initialize various options
that you want to use in different environments. For example, you might define one set
of options for editing text, but another set for editing source programs. The . e x r c
file in your home directory will be executed first, then the one on your current direc-
tory.
0 Learning vi 0 51

Finally, if the shell variable E X I N I T is set in your environment (with the


Bourne shell e x p o r t command, or the C shell setenv command), any commands
it contains will be executed by v i on startup. If E X I N I T is set, it will be used
instead of - exrc; v i will not take commands from both.

The set Command


There are two types of options that can be changed with the s e t command: toggle
options, which are either on or off, and options that take a numeric or string value (such
as the location of a margin or the name of a file).
Toggle options may be on or off by default. To turn a toggle option on, the com-
mand is:
:set option

To turn a toggle option off, the command is:


:set nooption

For example, to specify that pattern searches should ignore case, you type:
:set ic

If you want v i to return to being case-sensitive in searches, give the command:


:set noic

Some options have values. For example, the option window sets the number of
lines shown in the screen “window.” You set values for these options with an equals
sign (=). For example:
set window=20

During a v i session, you can check what options are available. The command:
:set all

displays the complete list of options, including options that you have set and defaults
that v i has chosen. The display will look something like this:
\

noautoindent open tabst op=8


autoprint prompt taglength=O
noautowrite noreadonly term=wy5 0
nobeautify redraw noterse
directory=/tmp /remap timeout
noedcompatible report=5 ttytype=wy50
noerrorbells scrolls=ll warn
hardtabs=8 sections=AhBhChDh window=2 0
noignorecase shell=/bin/csh wrapscan
nolisp shiftwidth=8 wrapmargin=lO
nolist noshowmatch nowriteany
magic noslowopen
me sg paragraphs=IPLPPPQP LIpplpipbb
number tags=tags /usr/lib/tags
nooptimize
/
52 UNlX Text Processing 0

You can also ask about the setting for any individual option by name, using the com-
mand:
:set option?

The command :s e t shows options that you have specifically changed, or set, either in
your . e x r c file or during the current session. For example, the display might look
like this:
number window=20 wrapmargin=lO

See Appendix A for a description of what these options mean.

The .e x r c File
The e x r c file that controls the vi environment for you is in your home directory.
Enter into this file the s e t options that you want to have in effect whenever you use
v i or e x .
The .e x r c file can be modified with the v i editor, like any other file. A sam-
ple .e x r c file might look like this:
set wrapmargin=lO window=20

Because the file is actually read by e x before it enters visual mode (vi),commands in
.e x r c should not have a preceding colon.
Alternate Environments
You can define alternate v i environments by saving option settings in an .e x r c file
that is placed in a local directory. If you enter v i from that directory, the local
.e x r c file will be read in. If it does not exist, the one in your home directory will be
read in.
For example, you might want to have one set of options for programming:
set number lisp autoindent sw=4 tags=/usr/lib/tags terse
and another set of options for text editing:
set wrapmargin=15 ignorecase

Local . e x r c files are especially useful when you define abbreviations, which are
described in Chapter 7.

Some Useful Options


As you can see when you type :s e t a l l , there are many options. Most options are
used internally by vi and aren’t usually changed. Others are important in certain
cases, but not in others (for example, n o r e d r a w and window can be useful on a
dialup line at a low baud rate). Appendix A contains a brief description of each option.
We recommend that you take some time to play with option setting-if an option looks
interesting, try setting it (or unsetting it) and watch what happens while you edit. You
may find some surprisingly useful tools.
0 Learning vi 0 53

There is one option that is almost essential for editing nonprogram text. The
w r a p m a r g i n option specifies the size of the right margin that will be used to
autowrap text as you type. (This saves manually typing carriage returns.) This option
is in effect if its value is set to greater than 0. A typical value is 10 or 15:
set wrapmargin=15

There are also three options that control how v i acts in conducting a search. By
default, it differentiates between uppercase and lowercase (foo does not match Foo),
wraps around to the beginning of the file during a search (this means you can begin
your search anywhere in the file and still find all occurrences), and recognizes wildcard
characters when matching patterns. The default settings that control these options are
n o i g n o r c a s e , w r a p s c a n , and magic, respectively. To change any of these
defaults, set the opposite toggles: i g n o r e c a s e , n o w r a p s c a n , or n o m a g i c .
Another useful option is s h i f t w i d t h . This option was designed to help pro-
grammers properly indent their programs, but it can also be useful to writers. The >>
and << commands can be used to indent (or un-indent) text by s h i f t w i d t h char-
acters. The position of the cursor on the line doesn’t matter-the entire line will be
shifted. The s h i f t w i d t h option is set to 8 by default, but you can use : s e t to
change this value.
Give the >> or << command a numeric prefix to affect more than on line. For
example :
lo>>

will indent the next 10 lines by shiftwidth.

Edits and Movement .


You have learned the edit commands c, d, and y. and how to combine them with
movements and numbers (such as 2 c w or 4 d d ) . Since that point, you have added
many more movement commands to your repertoire. Although the fact that you can
combine edit commands with movement is not a “new” concept to you, Table 3-2
gives you a feel for the many editing options you now have.

TABLE 3-2. Combining vi Commands

From Cursor to Change Delete COPY


Bottom of screen cL dL YL
Next line C+ d+ Y+
Next sentence C) d) Y)
Next paragraph Cl d) Yl
Pattern c /pattern d /pattern y /pattern
End of file cG dG YG
Line number 13 c13G d13G y13G
54 UNlX Text Processing 0

You can also combine numbers with any of the commands in Table 3-2 to multi-
ply them. For example, 2 c ) changes the next two sentences. Although this table may
seem forbidding, experiment with combinations and try to understand the patterns.
When you find how much time and effort you can save, combinations of change and
movement keys will no longer seem obscure, but will readily come to mind.

More Ways to Insert Text


You have inserted text before the cursor with the sequence:
itext <ESC>
There are many insert commands. The difference between them is that they insert text
at different positions relative to the cursor:
a append text after cursor
A append text to end of current line
i insert text before cursor
I insert text at beginning of line
0 open new line below cursor for text
0 open new line above cursor for text
R overstrike existing characters with new characters
All these commands leave you in insert mode. After inserting text, remember to press
ESC to escape back to command mode.
The A (append) and 1 (insert) commands save you from having to move the
cursor to the end or beginning of the line before invoking insert mode. For example, A
saves one keystroke over $a. Although one keystroke might not seem like a
timesaver, as you become a more adept (and impatient) editor, you’ll want to omit any
unnecessary keystrokes.
There are other combinations o f commands that work together naturally. For
example, e a i s useful for appending new text to the end o f a word. (It sometimes
helps to train yourself to recognize such frequent combinations so that invoking them
becomes automatic.)

Using Buffers
While you are editing, you have seen that your last deletion (d or x) or yank ( y ) is
saved in a buffer (a place in stored memory). You can access the contents of that buffer
and put the saved text back in your file with the put command (p or P).
The last nine deletions are stored by v i in numbered buffers. You can access
any o f these numbered buffers to restore any (or all) of the last nine deletions. You can
also place yanks (copied text) in buffers identified by fetters. You can fill up to 26
buffers (a through z) with yanked text and restore that text with a put command any
time in your editing session.
0 Learning vi 0 55

The v i program also saves your last edit command (insert, change, delete, or
yank) in a buffer. Your last command is available to repeat or undo with a single key-
stroke.

Recovering Deletions
Being able to delete large blocks of text at a single bound i s all well and good, but what
if you mistakenly delete 53 lines that you need? There is a way to recover any of your
past nine deletions, which are saved in numbered buffers. The last deletion is saved in
buffer 1 ; the second-to-last in buffer 2, and so on.
To recover a deletion, type (quotation mark), identify the buffered text by
number, and then give the put command. For example, to recover your second-to-last
deletion from buffer 2, type:
"2p

Sometimes it's hard to remember what's in the last nine buffers. Here's a trick
that can help.
The . command (repeat last command) has a special meaning when used with p
and u. The p command will print the last deletion or change, but 2p will print the
last two. By combining p, . (dot), and u (undo), you can step back through the
numbered buffers.
The l l l pcommand will put the last deletion, now stored in buffer 1 , back into
your text. If you then type u, it will go away. But when you type the . command,
instead of repeating the last command ("lp), it will show the next buffer as if you'd
typed 'I2p. You can thus step back through the buffers. For example, the sequence:
I, 1pu.u.u.u.u.

will show you, in sequence, the contents of the last six numbered buffers.

Yanking to Named Buffers


With unnamed buffers, you have seen that you must put (p or P) the contents of the
buffer before making any other edit, or the buffer is overwritten. You can also use y
with a set of 26 named buffers (a through z), which are specifically for copying and
moving text. If you name a buffer to store the yanked text, you can place the contents
of the named buffer at any time during your editing session.
To yank into a named buffer, precede the yank command with a quotation mark
(") and the character for the name of the buffer you want to load. For example:
"dYY yank current line into buffer d
"a6yy yank next six Iines into buffer a

After loading the named buffers and moving to the new position, use p or P to
put the text back.
" dP put buffer d before cursor
" ap put buffer a after cursor
56 0 UNlX Text Processing 0

-
In our conversation last In o u r conversation last
-
Thursday, we discussed a "a6yy Thursday, we discussed a
documentation project yank 6 lines documentation project
that would produce a to buffer Q that would produce a
user's manual on t h e user's manual on t h e
Alcuin product. Alcuin product.

6 lines yanked

Blcuin product -
I 'lap
Alcuin product.
-
In our conversation last

I put buffer (I
after cursor
Thursday, w e discussed a
documentation project
that would produce a
user's manual on t h e
Alcuin product.

There is no way to put part of a buffer into the text-it is all or nothing.
Named buffers allow you to make other edits before placing the buffer with p.
After you know how to travel between files without leaving v i , you can use named
buffers to selectively transfer text between files.
You can also delete text into named buffers, using much the same procedure. For
example:
"a5dd delete five lines into buffer a
If you specify the buffer name with a capital latter, yanked or deleted text will be
appended to the current contents of the buffer. For example:
"bYY yank current line into buffer h
"B5dd delete five lines and append to buffer h
3) move down three paragraphs
"bP insert the six lines from buffer b above the cursor
When you put text from a named buffer, a copy still remains in that buffer; you can
repeat the put as often as you like until you quit your editing session or replace the text
in the buffer.
For example, suppose you were preparing a document with some repetitive ele-
ments, such as the skeleton for each page of the reference section in a manual. You
could store the skeleton in a named buffer, put it into your file, fill in the blanks, then
put the skeleton in again each time you need it.
0 Learning v i 0 57

Marking Your Place


During a v i session, you can mark your place in the file with an invisible “book-
mark,” perform edits elsewhere, then return to your marked place. In the command
mode:
I’ mu marks current position with x ( x can be any letter)
‘I ‘x moves cursor to beginning of line marked by x
‘I’X
moves cursor to character marked by x
8,. . returns to previous mark or context after a move

Sincerely,
putting together a mx G
written plan that mark and move
to end of file
Fred C a s l o n

Sincerely, Today, I’ll s t a r t


putting together a
written plan that

Ered Caslon

Place markers are set only during the current v i session; they are not stored in the file.

Other Advanced Edits


You may wonder why we haven’t discussed global changes, moving text between files,
or other advanced ex topics. The reason is that, to use these tools, it helps to learn
more about ex and a set of UNIX pattern-matching tools that we discuss together in
Chapter 7.
C H A P T E R

nrof f and t r o f f

The v i editor lets you edit text, but it is not much good at formatting. A text file such
as program source code might be formatted with a simple program like pr, which
inserts a header at the top of every page and handles pagination, but otherwise prints the
document exactly as it appears in the file. But for any application requiring the
preparation of neatly formatted text, you will use the n r o f f (“en-roff”) or t r o f f
(“tee-roff”) formatting program.
These programs are used to process an input text file, usually coded or “marked
up” with formatting instructions. When you use a wysiwyg program like most word
processors, you use commands to lay out the text on the screen as it will be laid out on
the page. With a markup language like that used by n ro f f and t r o f f , you enter
commands into the text that tell the formatting program what to do.
Our purpose in this chapter is twofold. We want to introduce the basic formatting
codes that you will find useful. But at the same time, we want to present them in the
context of what the formatter is doing and how it works. If you find this chapter
rough-going-especially if this is your first exposure to n r o f f /t rof f-skip ahead
to either Chapter 5 or Chapter 6 and become familiar with one of the macro packages,
m s or mm; then come back and resume this chapter. We assume that you are reading
this book because you would like more than the basics, that you intend to master the
complexities of nrof f / t r o f f . A s a result, this chapter is somewhat longer and
more complex than it would be if the book were an introductory user’s guide.

Conventions
To distinguish input text and requests shown in examples from formatter output,
we have adopted the convention of showing “page comers” around output from
n r o f f or t r o f f . Output from n r o f f is shown in the same constant-width
typeface as other examples:

Here is an example of nroff output.

58
0 n r o f f and t r o f f 0 59

Output from t r o f f is shown in the same typeface as the text, but with the size of the
type reduced by one point, unless the example calls for an explicit type size:

I I
I Here is an example of troff output. I
In representing output, compromises sometimes had to be made. For example, when
showing nrof f output, we have processed the example separately with n r o f f, and
read the results back into the source file. However, from there, they have been typeset
in a constant-width font by t r o f f. As a result, there might be slight differences from
true nroff output, particularly in line length or page size. However, the context
should always make clear just what is being demonstrated.

What the Formatter Does


Take a moment to think about the things you do when you format a page on a wysiwyg
device such as a typewriter:

You set aside part of the page as the text area. This requires setting top, bot-
tom, left, and right margins.
You adjust the lines that you type so they are all approximately the same
length and fit into the designated text area.
You break the text into syntactic units such as paragraphs.
You switch to a new page when you reach the bottom of the text area.

Left to themselves, nroff or t r o f f will do only one of these tasks: they will
adjust the length of the lines in the input file so that they come out even in the output
file. To do so, they make two assumptions:

They assume that the line length is 6.5 inches.


They assume that a blank line in the input signals the start of a new paragraph.
The last line of the preceding text is not adjusted, and a blank line is placed in
the output.

The process of filling and adjusting is intuitively obvious-we’ve all done much the
same thing manually when using a typewriter or had it done for us by a wysiwyg word
processor. However, especially when it comes to a typesetting program like t r o f f ,
there are ramifications to the process of line adjustment that are not obvious. Having a
clear idea of what is going on will be very useful later. For this reason, we’ll examine
the process in detail.
60 0 UNlX Text Processing

Line Adjustment
There are three parts to line adjustment: filling, justification, and hyphenation. Filling
is the process of making all lines of text approximately equal in length. When working
on a typewriter, you do this automatically, simply by typing a camage return when the
line is full. Most word-processing programs automatically insert a carriage return at the
end of a line, and we have seen how to set up v i to do so as well.
However, n r o f f and t r o f f ignore carriage returns in the input except in a
special “no fill” mode. They reformat the input text, collecting all input lines into
even-length output lines, stopping only when they reach a blank line or (as we shall see
shortly) a formatting instruction that tells them to stop. Lines that begin with one or
more blank spaces are not filled, but trailing blank spaces are trimmed. Extra blank
spaces between words on the input line are preserved, and the formatter adds an extra
blank space after each period, question mark, or exclamation point.
Justification is a closely related feature that should not be confused with filling.
Filling simply tries to keep lines approximately the same length; justification adjusts the
space between words so that the ends of the lines match exactly.
By default, n r o f f and t r o f f both fill and justify text. Justification implies
filling, but it is possible to have filling without justification. Let’s look at some exam-
ples. First, we’ll look at a paragraph entered in v i . Here’s a paragraph from the letter
you entered in the last chapter, modified so that it offers to prepare not just a user’s
guide for the Alcuin illuminated lettering software, but a reference manual as well. In
the course of making the changes, we’ve left a short line in the middle of the paragraph.
In our conversation last Thursday, we discussed a
documentation project that would produce a user’s guide
and reference manual
for the Alcuin product. Yesterday, I received t h e product
demo and other materials that you sent me.

Now, let’s look at the paragraph after processing by n r o f f:

In our conversation last Thursday, we discussed a


documentation project that would produce a user‘s
guide and reference manual for t h e Alcuin product.
Yesterday, I received t h e product demo and other
materials that you sent me.
The paragraph has been both filled and justified. If the formatter were told to fill, but
not to justify, the paragraph would look like this:

In our conversation last Thursday, we discussed a


documentation project that would produce a user‘s guide
and reference manual for t h e Alcuin product. Yesterday,
I received t h e product demo and other materials that
you sent me.
0 nroff andtroff 0 61

As you can see, n r o f f justified the text in the first example by adding extra space
between words.
Most typewritten material is filled but not justified. In printer’s terms, it is typed
rugged right. Books, magazines, and other typeset materials, by contrast, are usually
right justified. Occasionally, you will see printed material (such as ad copy) in which
the right end of each line is justified, but the left end is ragged. It is for this reason that
we usually say that text is right or left justifzed, rather than simply justified.
When it is difficult to perform filling or justification or both because a long word
falls at the end of a line, the formatter has another trick to fall back on (one we are all
familiar with)-hyphenation.
The n r o f f and t r o f f programs perform filling, justification, and hyphena-
tion in much the same way as a human typesetter used to set cold lead type. Human
typesetters used to assemble a line of type by placing individual letters in a tray until
each line was filled. There were several options for filling as the typesetter reached the
end of the line:

The next word might fit exactly.


The next word might fit if the typesetter squeezed the words a little closer
together.
The next word could be hyphenated, with part put on the current line and part
on the next line.

If, in addition to being filled, the text was to be justified, there was one additional issue:
after the line was approximately the right length, space needed to be added between
each word so that the line length came out even.
Just like the human typesetter they replace, n r o f f and t r o f f assemble one
line of text at a time, measuring the length of the line and making adjustments to the
spacing to make the line come out even (assuming that the line is to be justified). Input
lines are collected into a temporary storage area, or hufSeer, until enough text has been
collected for a single output line. Then that line is output, and the next line collected.
It is in the process of justification that you see the first significant difference
between the two programs. The n r o f f program was designed for use with
typewriter-like printers; t r o f f was designed for use with phototypesetters.
A typewriter-style printer has characters all of the same size-an i takes up the
same amount of space as an m. (Typical widths are 1/10 or 1/12 inch per character.)
And although some printers (such as daisywheel printers) allow you to change the style
of type by changing the daisywheel or thimble, you can usually have only one typeface
at a time.
A typesetter, by contrast, uses typefaces in which each letter takes up an amount
of space proportional to its outline. The space allotted for an i is quite definitely nar-
rower than the space allotted for an m. The use of variable-width characters makes the
job of filling and justification much more difficult for t r o f f than for n r o f f .
Where n r o f f only needs to count characters, t r o f f has to add up the width of
each character as it assembles the line. (Character widths are defined by a “box”
around the character, rather than by its natural, somewhat irregular shape.)
62 0 UNlX Text Processing 0

The t r o f f program also justifies by adding space between words, but because
the variable-width fonts it uses are much more compact, it fits more on a line and gen-
erally does a much better job of justification.*
There’s another difference as well. Left to itself, n r o f f will insert only full
spaces between words-that is, it might put two spaces between one pair of words, and
three between another, to fill the line. If you call n r o f f with the -e option, it will
attempt to make all interword spaces the same size (using fractional spaces if possible).
But even then, nrof f will only succeed if the output device allows fractional spacing.
The t r o f f program always uses even interword spacing.
Here’s the same paragraph filled and justified by t r o f f:

In our conversation last Thursday, we discussed a documentation project that would


produce a user’s guide and reference manual for the Alcuin product. Yesterday, I
received the product demo and other materials that you sent me.
To make matters still more difficult, typeset characters come in a variety of dif-
ferent designs, orfonts. A font is a set of alphabetic, numeric, and punctuation charac-
ters that share certain design elements. Typically, fonts come in families of several
related typefaces. For example, this book is typeset for the most part in the Times
Roman family of typefaces. There are three separate fonts:
roman
bold
italic
Typesetting allows for the use of multiple fonts on the same page, as you can see from
the mixture of fonts throughout this book. Sometimes the fonts are from the same fam-
ily, as with the Times Roman, Times Bold, and Times Italic just shown. However, you
can see other fonts, such as Helvetica, in the running headers on each page. Bold and
italic fonts are generally used for emphasis; in computer books such as this, a constant-
width typewriter font is used for examples and other “computer voice” statements.
Even within the same font family, the width of the same character varies from
font to font. For example, a bold “m” is slightly wider than a Roman “m.”
To make things still more complicated, the same font comes in different sizes. If
you look at this book, you will notice that the section headings within each chapter are
slightly larger for emphasis. Type sizes are measured in units called points. We’ll talk
more about this later, but to get a rough idea of what type sizes mean, simply look at
the current page. The body type of the book is 10-point Times Roman: the next head-
ing is 12-point Times Bold. The spacing between lines is generally proportional to the
point size, instead of fixed, as it i s with n r o f f .

*The very best typesetting programs have the capability to adjust the space between individual characters as
well. This process i s called kerning. SoftQuad Publishing Software in Toronto sells an enhanced version of
t r o f f called SQroff that does support kerning.
0 nroff and troff 0 63

The t r o f f program gets information about the widths of the various characters
in each font from tables stored on the system in the directory / u s r / l i b / f o n t .
These tables tell t r o f f how far to move over after it has output each character on the
line.
We’ll talk more about t r o f f later. For the moment, you should be aware that
the job of the formatting program is much more complicated when typesetting than it is
when preparing text for typewriter-style printers.

Using n r o f f
As mentioned previously, left to themselves, n r o f f and t r o f f perform only rudi-
mentary formatting. They will fill and justify the text, using a default line length of 6.5
inches, but they leave no margins, other than the implicit right margin caused by the
line length. To make this clearer, let’s look at the sample letter from the last chapter
(including the edit we made in this chapter) as it appears after formatting with n r o f f .
First, let’s look at how to invoke the formatter. The n r o f f program takes as an
argument the name of i? file to be formatted:
$ nroff l e t t e r

Alternatively, it can take standard input, allowing you to preprocess the text with some
other program before formatting it:
$ t b l report I nroff
There are numerous options to n r o f f . They are described at various points in this
book (as appropriate to the topic) and summarized in Appendix B.
One basic option is -T, which specifies the terminal (printer) type for which out-
put should be prepared. Although n r o f f output is fairly straightforward, some differ-
ences between printers can significantly affect the output. (For example, one printer
may perform underlining by backspacing and printing an underscore under each under-
lined letter, and another may do it by suppressing a newline and printing the under-
scores in a second pass over the line.) The default device is the Teletype Model 37
terminal-a fairly obsolete device. Other devices are listed in Appendix B. If you
don’t recognize any of the printers or terminals, the safest type is probably lp:
$ n r o f f -Tlp file

In examples in this book, we will leave off the -T option, but you may want to experi-
ment, and use whichever type gives the best results with your equipment.
Like most UNIX programs, n r o f f prints its results on standard output. So,
assuming that the text is stored in a file called letter, all you need to do is type:
$ nroff letter

A few moments later, you should see the results on the screen. Because the letter will
scroll by quickly, you should pipe the output of n r o f f to a paging program such as
pgor more:
64 0 UNlX Text Processing 0

$ nroff l e t t e r I pg
or out to a printer using 1 p or l p r :
$ nroff l e t t e r I lp

Usingtroff
The chief advantage of t r o f f over nrof f is that it allows different types of charac-
ter sets, or fonts, and so lets you take full advantage of the higher-quality printing avail-
able with typesetters and laser printers. There are a number of requests, useful only in
t r o f f , for specifying fonts, type sizes, and the vertical spacing between lines. Before
we describe the actual requests though, we need to look at a bit of history.
The t r o f f program was originally designed for a specific typesetter, the Wang
C/A/T. Later, it was modified to work with a wide range of output devices. We’ll dis-
cuss the original version of t r o f f (which is still in use at many sites) first, before
discussing the newer versions. The C/A/T typesetter was designed in such a way that it
could use only four fonts at one time.
(Early phototypesetters worked by projecting light through a film containing the
outline of the various characters. The film was often mounted on a wheel that rotated
to position the desired character in front of the light source as it flashed, thus photo-
graphing the character onto photographic paper or negative film. Lenses enlarged and
reduced the characters to produce various type sizes. The C/A/T typesetter had a wheel
divided into four quadrants, onto which one could mount four different typefaces.)
Typically, the four fonts were the standard (roman), bold, and italic fonts of the
same family, plus a “special” font that contained additional punctuation characters,
Greek characters (for equations), bullets, rules, and other nonstandard characters. Fig-
ure 4-1 shows the characters available in these standard fonts.

The Coming of d i t r o f f
Later, t rof f was modified to support other typesetters and, more importantly (at least
from the perspective of many readers of this book), laser printers. The later version of
t r o f f is often called d i t r o f f (for device-independent t r o f f ) , but many UNIX
systems have changed the name of the original t r o f f to o t r o f f and simply call
d i t r o f f by theoriginalname, t r o f f .
The d i t r o f f program has not been universally available because, when it was
developed, it was “unbundled” from the basic UNIX distribution and made part of a
separate product called Documenter’s Workbench or DWB. UNIX system manufactur-
ers have the option not to include this package, although increasingly, they have been
doing so. Versions of DWB are also available separately from third party vendors.
The newer version of t ro f f allows you to specify any number of different
fonts. (You can mount fonts at up to ten imaginary “positions” with .fp and can
request additional fonts by name).
0 n r o f f and t r o f f 0 65

Times Roman

abcdefghijklmnopqrstuv wxyz
ABCDEFGHIJKLMNOPQRSTUVWXYZ
1234567890
! $ % & ( ) ' ' * + - . , I :; = ? [ ] I
- - - '14 '12 3/4 fi fl "t't 80
Times Italic

abcdefghijklrnnopqrstuvwxyz
ABCDEFGHIJKLMNOPQRSTUVWXYZ
1234567890
!$%&a()" * + - . , I : ;=
- - - '14 ' / z 3 / 4 f i f E " f 't.8 0

Times Bold

abcdefghijklmnopqrstuvwxyz
ABCDEFGHIJKLMNOPQRSTUVWXYZ
1234567890
!$%&()"*+-.,/:;=?[]I
0 - - - '14 I12 3/4 fi fl "t'C80
Special Mathematical Font

Fig. 4-2. The Four Standard Fonts

There may also be different font sizes available, and there are some additional corn-
rnands for line drawing ( d i t r o f f can draw curves as well as straight lines). For the
most part, though, d i t r o f f is very similar to the original program, except in the
greater flexibility it offers to use different output devices.
One way to find out which version of t r o f f you have on your system (unless
you have a program explicitly called d i t r o f f ) is to list the contents of the directory
/usr/lib/font:
66 0 UNlX Text Processing 0

$18 -F /usr/lib/font
d e v l j/
devps/
ftB
ftI
ftR
ftS

If there are one or more subdirectories whose name begins with the letters dev, your
system is using d i t r o f f . Our system supports both d i t r o f f and o t r o f f , so
we have both a device subdirectory (for d i t r o f f ) and font files (for o t r o f f )
directly in / u s r / l i b / f o n t .
We’ll talk more about font files later. For the moment, all you need to know is
that they contain information about the widths of the characters in various fonts for a
specific output device.
Contrary to what a novice might expect, font files do not contain outlines of the
characters themselves. For a proper typesetter, character outlines reside in the typesetter
itself. All t r o f f sends out to the typesetter are character codes and size and position
information.
However, t r o f f has increasingly come to be used with laser printers, many of
which use downloadable fonts. An electronic image of each character is loaded from
the computer into the printer’s memory, typically at the start of each printing job.
There may be additional “font files” containing character outlines in this case, but
these files are used by the software that controls the printer, and have nothing to do
with t r o f f itself. In other cases, font images are stored in ROM (read-only memory)
in the printer.
If you are using a laser printer, it is important to remember that t r o f f itself has
nothing to do with the actual drawing of characters or images on the printed page. In a
case like this, t r o f f simply formats the page, using tables describing the widths of
the characters used by the printer, and generates instructions about page layout, spacing,
and so on. The actual job of driving the printer is handled by another program, gen-
erally referred to as a printer driver or t r o f f postprocessor.
To use t r o f f with such a postprocessor, you will generally need to pipe the
output of t r o f f to the postprocessor and from there to the print spooler:
. t r o f f file
$ I postprocessor I lp
If you are using the old version of t r o f f , which expects to send its output directly to
the C/mtypesetter, you need to specify the -t option, which tells t r o f f to use
standard output. If you don’t, you will get the message:
Typesetter busy.

(Of course, if by any chance you are connected to a C/A/T typesetter, you don’t need
this option. There are several other options listed in Appendix B that you may find use-
ful.) When you use d i t r o f f , on the other hand, you will need to specify the -T
command-line option that tells it what device you are using. The postprocessor will
then translate the device-independent t r o f f output into instructions for that particular
type of laser printer or typesetter. For example, at our site, we use t r o f f with an
0 nroff and t r o f f 0 67

Apple Laserwriter and Pipeline Associates’ devps postprocessor, which translates


t r o f f output for the Laserwriter. Our command line looks something like this:
$ ditroff -Tps files I devps I lp

You can print the same file on different devices, simply by changing the -T option and
the postprocessor. For example, you can print drafts on a laser printer, then switch to a
typesetter for final output without making extensive changes to your files. (To actually
direct output to different printers, you will also have to specify a printer name as an
option to the lp command. In our generic example, we simply use lp without a n y
options, assuming that the appropriate printer is connected as the default printer.)
Like all things in life, this is not always as easy as it sounds. Because the fonts
used by different output devices have different widths even when the nominal font
names and sizes are the same, pagination and line breaks may be different when you
switch from one device to another.
The job of interfacing d i t r o f f to a wide variety of output devices i s becoming
easier because of the recent development of industry-wide page description languages
like Adobe Systems’ PostScript, Xerox’s Interpress, and Imagen’s DDL. These page
description languages reside in the printer, not the host computer, and provide a device-
independent way of describing placement of characters and graphics on the page.
Rather than using a separate postprocessor for each output device, you can now
simply use a postprocessor to convert t r o f f output to the desired page description
language. For example, you can use Adobe Systems’ Transcript postprocessor (or an
equivalent postprocessor like devps from Pipeline Associates) to convert t r o f f
output to PostScript, and can then send the PostScript output to any one of a number of
typesetters or laser printers.
From this point, whenever we say t r o f f , we are generally referring to
d i t r o f f . In addition, although we will continue to discuss n r o f f as it differs from
t r o f f , our emphasis is on the more capable program. It is our opinion that the grow-
ing availability of laser printers will make t r o f f the program of choice for almost all
users in the not too distant future.
However, you can submit a document coded for t r o f f to n r o f f with entirely
reasonable results. For the most part, formatting requests that cannot be handled by
n r o f f are simply ignored. And you can submit documents coded for n r o f f to
t r o f f , though you will then be failing to use many of the characteristics that make
t r o f f desirable.

The Markup Language


The n r o f f and t r o f f markup commands (often called requests) typically consist
of one or two lowercase letters and stand on their own line, following a period or apos-
trophe in column one. Most requests are reasonably mnemonic. For example, the
request to leave space is:
- SP
There are also requests that can be embedded anywhere in the text. These requests are
commonly called escape sequences. Escape sequences usually begin with a backslash
68 0 UNlX Text Processing 0

(\) . For example, the escape sequence \ 1 will draw a horizontal line. Especially in
t r o f f, escape sequences are used for line drawing or for printing various special char-
acters that do not appear in the standard ASCII character set. For instance, you enter
\ ( b u to get 0 , a bullet.
There are three classes of formatting instructions:

Instructions that have an immediate one-time effect, such as a request to space


down an inch before outputting the next line of text.
Instructions that have a persistent effect, such as requests to set the line length
or to enable or disable justification.

= Instructions that are useful for writing macros. There is a “programming


language” built into the formatter that allows you to build up complex requests
from sequences of simpler ones. As part of this language there are requests for
storing values into variables called strings and number registers, for testing
conditions and acting on the result, and so on.

For the most part, we will discuss the requests used to define macros, strings, and
number registers later in this book.
At this point, we want to focus on understanding the basic requests that control
the basic actions of the formatter. We will also learn many of the most useful requests
with immediate, one-time effects. Table 4-1 summarizes the requests that you will use
most often.

TABLE 4-1. Basic n r o f f k r o f f Requests

Request Meaning Request Meaning


.ad Enable line adjustment .n a No justification of lines
- br Line break .ne Need lines to end of page
- bP Page break .n f No filling of lines
.ce Center next line .n r Define and set number register
.de Define macro - PO Set page offset
- ds Define string - PS Set point size
.fi Fill output lines .so Switch to source file and return
.ft Set current font - SP Space
.in Set indent .t a Set tab stop positions
.Is Set double or triple spacing .ti Set temporary indent
-11 Specify line length .vs Set vertical line spacing

Looking at nrof f Output


When we discussed the basic operations of the text formatter, we saw that n r o f f and
t r o f f perform rudimentary formatting. They will fill and justify the text, using a
0 nroff and troff 0 69

default line length of 6.5 inches, but they leave no margins, other than the implicit right
margin caused by the line length.
To make this clearer, let’s look at the sample letter from the last chapter as it
appears after formatting with n r o f f, without any embedded requests, and without
using any macro package. From Figure 4-2, you can see immediately that the formatter
has adjusted all of the lines, so that they are all the same length-ven in the address
block of the letter, where we would have preferred them to be left as they were. Blank
lines in the input produce blank lines in the output, and the partial lines at the ends of
paragraphs are not adjusted.
The most noticeable aspect of the raw formatting is a little difficult to reproduce
here, though we’ve tried. No top or left margin is automatically allocated by nrof f.

Turning Filling On and Off


Even though filling of uneven text lines resulting from editing is probably the most
basic action we want from the formatter, it is not always desirable. For example, in our
letter, we don’t want the address block to be filled. There are two requests we could
use to correct the problem: .b r (break) and .n f (no f i l l ) .
A .b r request following a line outputs the current contents of the line buffer and
starts the next line, even though the buffer is not yet full. To produce a properly for-
matted address block, we could enter the following requests in the file:
Mr. John F u s t
.br
Vice President, Research and Development
.br
Gutenberg Galaxy Software
.br
Waltham, Massachusetts 02159
Each individual input line will be output without filling or justification. We could also
use the .n f request, which tells n r o f f to stop filling altogether. Text following this
request will be printed by the formatter exactly as it appears in the input file. Use this
request when you want text to be laid out as it was typed in.
Because we do want the body of the letter to be filled, we must turn filling back
on with the .f i fill) request:
April 1, 1987
- nf
Mr. John Fust
Vice President, Research and Development
Gutenberg Galaxy Software
Waltham, Massachusetts 02159
.fi
Dear Mr. Fust:
70 0 UNlX Text Processing 0

April 1, 1987

r - John Fust Vice President, Research and


evelopment Gutenberg Galaxy Software Waltham,
assachusetts 02159

ear Mr. Fust:

n our conversation last Thursday, we discussed a


ocumentation project that would produce a user's
uide and reference manual for the Alcuin product.
esterday, I received the product demo and other
aterials that you sent me. After studying them,
want to clarify a couple of points:

oing through a demo session gave me a much better


nderstanding of the product. I confess to being
mazed by Alcuin. Some people around here,
ooking over my shoulder, were also astounded by
he illustrated manuscript I produced with Alcuin.
ne person, a student of calligraphy, was really
mpre ssed.

omorrow, I'll start putting together a written


lan that presents different strategies for
ocumenting the Alcuin product. After I submit
his plan, and you have had time to review it,
et's arrange a meeting at your company to discuss
hese stratgies.

hanks again for giving us the opportunity to bid


n this documentation project. I hope we can
ecide upon a strategy and get started as soon as
ossible in order to have the manual ready in time
or first customer ship. I look forward to meeting
ith you towards the end of next week.

Sincerely,

Fred Caslon

Fig. 4-2. A Raw nr o ff-formatted File


0 nroffandtroff 0 71

If you look carefully at the previous example, you will probably notice that we entered
the two formatting requests on blank lines in the letter. If we were to format the letter
now, here is what we?d get:

April 1, 1987
Mr. John Fust
Vice President, Research and Development
Gutenberg Galaxy Software
Waltham, Massachusetts 02159
Dear Mr. Fust:

As you may notice, we?ve lost the blank lines that used to separate the date from the
address block, and the address block from the salutation. Lines containing formatting
requests do not result in any space being output (unless they are spacing requests), so
you should be sure not to inadvertently replace blank lines when entering formatting
codes.

Controlling Justification
Justification can be controlled separately from filling by the .a d (adjust) request.
(However, filling must be on for justification to work at all.) You can adjust text at
either margin or at both margins.
Unlike the . b r and .nf requests introduced, . a d takes an argument, which
specifies the type of justification you want:

1 adjust left margin only


r adjust right margin only
b adjust both margins
C center filled line between margins

There is another related request, .n a (no adjust). Because the text entered in a
file is usually left justified to begin with, turning justification off entirely with - n a
produces similar results to .ad 1 in most cases.
However, there is an important difference. Normally, if no argument is given to
the .a d request, both margins will be adjusted. That is, .a d is the same as .a d b.
However, following an .na request, .a d reverts to the value last specified. That is,
the sequence:
.ad r
Some text
.ad 1
Some text
-
ad
Some texr
will adjust both margins in the third block of text. However, the sequence:
72 0 UNlX Text Processing 0

.ad r
Some text
.na
Some text
.ad
Some text
will adjust only the right margin in the third block of text.
It’s easy to see where you would use .ad b or .ad 1. Let’s suppose that
you would like a ragged margin for the body of your letter, to make it look more like it
was prepared on a typewriter. Simply follow the .f i request we entered previously
with . a d 1.
Right-only justification may seem a little harder to find a use for. Occasionally,
you’ve probably seen ragged-left copy in advertising, but that’s about it. However, if
you think for a moment, you’ll realize that it is also a good way to get a single line over
to the right margin.
For example, in our sample letter, instead of typing all those leading spaces before
the date (and having it fail to come out flush with the margin anyway), we could enter
the lines:
.ad r
April 1, 1987
.ad b

As it turns out, this construct won’t quite work. If you remember, when filling is
enabled, nro f f and t r o f f collect input in a one-line buffer and only output the
saved text when the line has been filled. There are some non-obvious consequences of
this that will ripple all through your use of nrof f and t r o f f . If you issue a
request that temporarily sets a formatting condition, then reset it before the line is out-
put, your original setting may have no effect. The result will he controlled by the
request that is in effect ut the time the line is output, not ut the time that it i s first col-
lected in the line buffer.
Certain requests cause implicit line breaks (the equivalent of carriage returns on a
typewriter) in the output, but others do not. The .ad request does not cause a break.
Therefore, a construction like:
.ad r
April 1, 1987
.ad b
Mr. John F u s t

will result in the following output:

I April 1, 1987 Mr. John F u s t

and not:

I Mr. John F u s t -
April 1, 1987
0 nroff and t r o f f 0 73

To make sure that you get the desired result from a temporary setting like this, be sure
to follow the line to be affected with a condition that will cause a break.” For instance,
in the previous example, you would probably follow the date with a blank line or an
.sp request, either of which will normally cause a break. If you don’t, you should put
in an explicit break, as follows:
.ad r
April 1, 1 9 8 7
br-
.ad b
Mr. John F u s t

A final point about justification: the formatter adjusts a line by widening the blank
space between words. If you do not want the space between two words adjusted or split
across output lines, precede the space with a backslash. This is called an unpaddable
space.
There are many obscure applications for unpaddable spaces; we wil1 mention
them as appropriate. Here’s a simple one that may come in handy: n r o f f and
t r o f f normally add two blank spaces after a period, question mark, or exclamation
point. The formatter can’t distinguish between the end of a sentence and an abbrevia-
tion, so if you find the extra spacing unaesthetic, you might follow an abbreviation like
Mr.with an unpaddable space: M r . \ J o h n F u s t .

Hyphenation
As pointed out previously, hyphenation is closely related to filling and justification, in
that it gives n r o f f and t r o f f some additional power to produce filled and justified
lines without large gaps.
The n r o f f and t K O f f programs perform hyphenation according to a general
set of rules. Occasionally, you need to control the hyphenation of particular words.
You can specify either that a word not be hyphenated or that it be hyphenated in a cer-
tain way. You can also turn hyphenation off entirely.

Specifying Hyphenation for Individual Words


There are two ways to specify that a word be hyphenated a specific way: with the
.hw request and with the special hyphenation indicator \%.
The .hw (hyphenate word) request allows you to specify a small list of words
that should be hyphenated a specific way. The space available for the word list is small
(about 128 characters), so you should use this request only for words you use fre-
quently, and that n r o f f and t r o f f hyphenate badly.

*The following requests cause a break:


.bp .br .ce .fi .nf .sp .in .ti
All other requests can be interspersed with text without causing a break. In addition, as discussed later,
even these requests can be introduced with a special “no break” control character ( ’ instead of .) so that
they too will not cause a break.
74 UNlX Text Processing 0

.
To use hw, simply specify the word or words that constitute the exception list,
typing a hyphen at the point or points in the word where you would like it to be
hyphenated:
.hw hy-phen-a-tion

You can specify multiple words with one - hw request, or you can issue multiple .hw
requests as you need them.
However, if it is just a matter of making sure that a particular instance of a word
is hyphenated the way you want, you can use the hyphenation indication character
sequence \ % . As you type the word in your text, simply type the two characters \ % at
each acceptable hyphenation point, or at the front of the word if you don’t want the
word to be hyphenated at all:
\%acknowledge the word acknowledge will not be hyphenated
ac\%know\%ledge the word acknowledge can be hyphenated only
at the specified points
This character sequence is the first instance we have seen of a formatting request that
does not consist of a request name following a period in column one. We will see
many more of these later. This sequence is embedded right in the text but does not
print out.
In general, nrof f and t r o f f do a reasonable job with hyphenation. You will
need to set specific hyphenation points only in rare instances. In genera1, you shouldn’t
even worry about hyphenation points, unless you notice a bad break. Then use either
.hw or \ % to correct it.
The UNIX hyphen command can be used to print out all of the hyphenation
points in a file formatted with n r o f f or trof f -a.
$ n r o f f optionsfiles I hyphen

or:
$ t r o f f options -a files I hyphen
If your system doesn’t have the hyphen command, you can use grep instead:
$ nroff oprionsjiles I grep ’ -$’
(The single quotation marks are important because they keep g r e p from interpreting
the - as the beginning of an option.)

Turning Hyphenation Off and On


If you don’t want any hyphenation, use the .nh ( n o hyphenation) request. Even if you
do this, though, you should be aware that words already containing embedded hyphens,
em dashes (-), or hyphen indication characters ( \ % ) will still be subject to hyphena-
tion.
After you’ve turned hyphenation off, you can turn it back on with the .hy
(hyphenate) request. This request has a few twists. Not only does it allow you to turn
hyphenation on, it also allows you to adjust the hyphenation rules that n r o f f and
t r o f f use. It takes the following numeric arguments:
0 n r o f f and t r o f f 0 75

0 turn hyphenation off


1 turn hyphenation on
2 do not hyphenate the last line on a page
4 do not hyphenate after the first two characters of a word
8 do not hyphenate before the last two characters of a word

Specifying .h y with no argument is the same as specifying - hy 1. The other


numeric values are additive. For example, .h y 1 2 ( - h y 4 plus .hy 8 ) will keep
n r o f f and t r o f f from breaking short syllables at the beginning or end of words,
and .h y 1 4 will put all three hyphenation restrictions into effect.

PageLayout
Apart from the adjusted address block, the biggest formatting drawback that you prob-
ably noticed when we formatted the sample letter is that there was no left or top margin.
Furthermore, though it is not apparent from our one-page example, there is no bottom
margin either. If there were enough text in the input file to run onto a second page, you
would see that the text ran continuously across the page boundary.
In normal use, these layout problems would be handled automatically by either
the m s or mm macro packages (described later). Here, though, we want to understand
how the formatter itself works.
Let’s continue our investigation of the n r o f f and t r o f f markup language
with some basic page layout commands. These commands allow you to affect the
placement of text on the page. Some of them (those whose descriptions begin with the
word s e t ) specify conditions that will remain in effect until they are explicitly changed
by another instance of the same request. Others have a one-time effect.
As shown in Table 4-2, there are two groups of page layout commands, those that
affect horizontal placement of text on the page and those that affect vertical placement.
A moment’s glance at these requests wiIl tell you that, before anything else, we need to
talk about units.

TABLE 4-2. Layout Commands

.1 1 n Set the line length to n


.PO n Set the left margin (page offset) to n
Horizontal Layout . in n Indent the left margin to n
.t i n Temporarily indent the left margin to n
.c e n Center the following n lines
.pl n Set the page length to n
.sp n Insert n spaces
Vertical Layout .b p n Start a new page
.w h n Specify when (at what vertical position
on the page) to execute a command
76 0 UNlX Text Processing 0

Units of Measure
By default, most n r o f f and t r o f f commands that measure vertical distance (such
as sp) do so in terms of a number of ‘‘lines’’ (also referred to as vertical spaces, or
vs). The n r o f f program has constant, device-dependent line spacing; t r o f f has
variable line spacing, which is generally proportional to the point size. However, both
programs do allow you to use a variety of other units as well. You can specify spacing
in terms of inches and centimeters, as well as the standard printer’s measures picas and
points. (A pica is 1/6 of an inch; a point is about 1/72 of an inch. These units were
originally developed to measure the size of type, and the relationship between these two
units is not as arbitrary as it might seem. A standard 12-point type is 1 pica high.)
Horizontal measures, such as the depth of an indent, can also be specified using
any of these measures, as well as the printer’s measures ems and ens. These are relative
measures, originally based on the size of the letters m and n in the current type size and
typeface. By default, horizontal measures are always taken to be in ems.
There is also a relationship between these units and points and picas. An em is
always equivalent in width to the height of the character specified by the point size. In
other words, an em in a 12-point type is 12 points wide. An en is always half the size
of an em, or half of the current point size. The advantage of using these units is that
they are relative to the size of the type being used. This is unimportant in n r o f f,
but using these units in t r o f f gives increased flexiblility to change the appearance of
the document without recoding.
The n r o f f and t r o f f programs measure not in any of these units, but in
device-dependent basic units. Any measures you specify are converted to basic units
before they are used. Typically, n r o f f measures in horizontal units of 1/240 of an
inch and o t r o f f uses a unit of 1/432 inch. These units too are not as arbitrary as
they may seem. According to Joseph Osanna’s NrofSITroff User’s Manual-the origi-
nal, dense, and authoritative documentation on t r o f f published by AT&T as part of
the UNlX Programmer’s Manual-the n r o f f units were chosen as “the least com-
mon multiple of the horizontal and vertical resolutions of various typewriter-like output
devices.” The units for o t r o f f were based on the C/A/T typesetter (the device for
which t r o f f was originally designed), which could move in horizontal increments of
1/432 of an inch and in vertical increments of exactly one-third that, or 1/144 inch.
Units for d i t r o f f depend on the resolution of the output device. For example, units
for a 300 dot-per-inch (dpi) laser printer will be 1/300 of an inch in either a vertical or a
horizontal direction. See Appendix D for more information on d i t r o f f device
units.
You don’t need to remember the details of all these measures now. You can gen-
erally use the units that are most familiar to you, and we’ll come back to the others
when we need them.
To specify units, you simply need to add the appropriate scale indicator from
Table 4-3 to the numeric value you supply to a formatting request. For example, to
space down 3 inches rather than 3 lines, enter the request:
.sp 3i

The numeric part of any scale indicator can include decimal fractions. Before the speci-
fied value is used, nro f f and t r o f f will round the value to the nearest number of
device units.
nroff and t r o f f 77

TABLE 4-3. Units of Measure

Indicator Units
C Centimeters
i Inches
m Ems
n Ens
P Points
P Picas
U Device Units
V Vertical spaces (lines)
none Default

In fact, you can use any reasonable numeric expression with any request that
expects a numeric argument. However, when using arithmetic expressions, you have to
be careful about what units you specify. All of the horizontally oriented requests-
.
.11, i n , .ti, .t a, .PO, It, and . .
mc-assume you mean ems unless you
specify otherwise.
Vertically oriented requests like .sp assume v’s unless otherwise specified.
The only exceptions to this rule are - p s and .vs, which assume points by default-
but these are not really motion requests anyway.
As a result, i f you make a request like:
- 1 1 7i/2

what you are really requesting is:


- 1 1 7i/2m

The request:
- 1 1 7i/2i

is not what you want either. In performing arithmetic, as with fractions, the formatter
converts scaled values to device units. In o t r o f f , this means the previous expres-
sion is really evaluated as:
-11 (7*432u)/ (2*432u)

I f you really want half o f 7 inches, you should specify the expression like this:
-11 7i/2u

You could easily divide 7 by 2 yourself and simply specify 3 . 5 . The point of this
example is that when you are doing arithmetic-usually with values stored in variables
called number registers (more on these later)-you will need to pay attention to the
interaction between units. Furthermore, because fractional device units are always
rounded down, you should avoid expressions like 7 i / 2 .5u because this is equivalent
to 7i/2u.
78 0 UNlX Text Processing 0

In addition to absolute values, many n r o f f and t r o f f requests allow you to


specify relative values, by adding a + or a - before the value. For example:
. 1 1 -.5i

will subtract '/2 inch from the current line length, whatever it is.

Setting Margins
In n r o f f and t r o f f, margins are set by the combination of the .PO (page ofSset)
and - 1 1 (line length) requests. The .PO request defines the left margin. The .11
request defines how long each line will be after filling, and so implicitly defines the
right margin:

right
PO 11
margin

The n r o f f program's default line length of 6.5 inches i s fairly standard for an 8[/2-
by-1 1 page-it allows for l-inch margins on either side.
Assuming that we'd like 11/4-inch margins on either side of the page, we would
issue the following requests:
-11 6 i
. P O 1.25i

This will give us 1 1 / 4 inches for both the right and left margins. The - P O request
specifies a left margin, or page offset, of 11/4 inches. When the 6-inch line length is
added to this, it will leave a similar margin on the right side of the page.
Let's take a look at how our sample letter will format now. One paragraph of the
output should give you the idea.

I In
discussed
our
a
conversation
documentation
last
project
Thursday,
that w o u
we
ld
p r o d u c e a user's g u i d e and r e f e r e n c e manua for
the Alcuin product. Yesterday, I received t h e
p r o d u c t demo a n d other m a t e r i a l s t h a t y o u s e n t m e .

As we saw earlier, n r o f f assumes a default page offset of 0. Either you or the macro
package you are using must set the page offset. In t r o f f, though, there is a default
page offset of 26/27 inch, so you can get away without setting this value.
(Keep in mind that all n r o f f output examples are actually simulated with
t r o f f , and are reduced to fit on our own 5-inch wide printed page. As a result, the
widths shown in our example output are not exact, but are suggestive of what the actual
result would be on an S1/2-by-l1 inch page.)
0 nroffandtroff 0 79

Setting Indents
In addition to the basic page offset, or left margin, you may want to set an indent, either
for a single line or an entire block of text. You may also want to center one or more
lines of text.
To do a single-line indent, as is commonly used to introduce a paragraph, use the
. t i (temporary indent) request. For example, if you followed the blank lines between
paragraphs in the sample letter with the request . t i 5, you’d get a result like this
from n r o f f :

7
...Yesterday,I received the product demo and other
materials that you sent me.

Going through a demo session gave me a


much better understanding of the product. I
confess to being amazed by Alcuin ...
The .i n request, by contrast, sets an indent that remains in effect until it is changed.
For example, if you had entered the line . i n 5 between the paragraphs, (instead of
.t i 5), the result would have looked like this:

. . .Yesterday,
I received the product demo and other
materials that you sent me.

Going through a demo session gave me a


much better understanding of the product.
I confess to being amazed by Alcuin ...
All succeeding paragraphs will continue to be indented, until the indent is reset. The
default indent (the value at the left margin) is 0.
These two indent requests can be combined to give a “hanging indent.”
Remember that you can specify negative values to many requests that take numeric
arguments. Here is the first case where this makes sense. Let’s say we would like to
modify the letter so that it numbers the points and indents the body of the numbered
paragraph:
. . .Yesterday,
I received the product demo and other materials
that you sent me. After studying them, I want to clarify
a couple of points:

.in 4
.ti -4
1. Going through a demo session gave me a much better
understanding of the product. I confess to being amazed by
Alcuin - - -
80 0 UNlX Text Processing 0

The first line will start at the margin, and subsequent lines will be indented:

I
...Yesterday, I received t h e product demo and other
materials that you sent me. After studying them,
I want t o clarify a couple of points:

1. Going through a demo session gave m e


better understanding of t h e product.
t o being amazed by Alcuin...
a much
I confess
1
To line up an indented paragraph like this in nrof f,just count the number of charac-
ters you want to space over, then use that number as the size of the indent. But this
trick is not so simple in t r o f f . Because characters, and even spaces, are not of con-
stant width, it is more difficult to create a hanging indent. Ens are a good unit to use
for indents. Like ems, they are relative to the point size, but they are much closer to the
average character width than an em. As a result, they are relatively intuitive to work
with. An indent of 5n is about where you expect a 5-character indent to be from fami-
liarity with a typewriter.

Centering Output Lines


Centering is another useful layout tool. To center the next line, use the .ce request:
.ce
This line will be centered.

Here’s the result:

This line will b e centered.

Centering takes into account any indents that are in effect. That is, if you have used
. i n to specify an indent of 1 inch, and the line length is 5 inches, text will be centered
within the 4-inch span following the indent.
To center multiple lines, specify a number as an argument to the request:
.ce 3
Documentation for t h e Alcuin Product

A Proposal Prepared by
Fred Caslon

Here’s the result:


I
0 nroff and t r o f f 0 81

Documentation f o r the Alcuin Product

A Proposal Prepared by
Fred Caslon

Notice that .ce centered all three text lines, ignoring the blank line between.
To center an indeterminately large number of lines, specify a very large number
with the .ce request, then turn it off by entering .ce 0:
.ce 1 0 0 0
Many lines of text here.
.ce 0
In looking at the examples, you probably noticed that centering automatically dis-
ables filling and justification. Each line is centered individually. However, there is also
the case in which you would like to center an entire filled and justified paragraph.
(This paragraph style is often used to set off quoted material in a book or paper.) You
can do this by using both the - i n and - 1 1 requests:
I was particularly interested by one comment that I
read in your company literature:

.in +5n
- 1 1 -5n
The development of Alcuin can be traced back to our
founder’s early interest in medieval manuscripts.
He spent several years in the seminary before
becoming interested in computers. After he became
an expert on typesetting software, he resolved to
put his two interests together.
.in -5n
. 1 1 +5n

Here’s the result:

I was particularly interested by one comment that I


read in your company literature:

The development of Alcuin can be traced back to


our founder’s early interest in medieval
manuscripts. He spent several years in the
seminary before becoming interested in comput-
ers. After he became an expert on typesetting
software, he resolved to put his two interests
together.
82 UNlX Text Processing 0

Remember that a line centered with .ce takes into account any indents in effect at the
time. You can visualize the relationship between page offset, line length, indents, and
centering as follows:

in I ce

Setting Tabs
No discussion of how to align text would be complete without a discussion of tabs. A
tab, as anyone who has used a typewriter well knows, is a horizontal motion to a prede-
fined position on the line.
The problem with using tabs in n r o f f and t r o f f is that what you see on the
screen is very different from what you get on the page. Unlike a typewriter or a
wysiwyg word processor, the editor/formatter combination presents you with two dif-
ferent tab settings. You can set tabs in v i , and you can set them in n r o f f and
t r o f f, but the settings are likely to be different, and the results on the screen defin-
itely unaesthetic.
However, after you get used to the fact that tabs will not line up on the screen in
the same way as they will on the printed page, you can use tabs quite effectively.
By default, tab stops are set every .8 inches in n r o f f and every .5 inches in
t r o f f . To set your own tab stops in n r o f f or t r o f f , use the . t a request. For
example:
.ta li 2 . 5 i 3i
will set three tab stops, at 1 inch, 2'/2 inches, and 3 inches, respectively. Any previous
or default settings are now no longer in effect.
You can also set incremental tab stops. The request:
.ta li +1.5i +.5i
will set tabs at the same positions as the previous example. Values preceded with a
plus sign are added to the value of the last tab stop.
You can also specify the alignment of text at a tab stop. Settings made with a
numeric value alone are left adjusted, just as they are on a typewriter. However, by
adding either the letter R or C to the definition of a tab stop, you can make text right
adjusted or centered on the stop.
For example, the following input lines (where a tab character is shown by the
symbol :1)-
.n f
.ta l i 2 . 5 i 3.5i
I 1 First I I Second I I Third
.fi

will produce the following output:


0 nroff andtroff 0 83

First Second Third I


But:
.nf
.ta li 2.5iR 3.5iC
I I First I I Second I I Third
.fi

will produce:

First Second Third

Right-adjusted tabs can be useful for aligning numeric data. This is especially
true in t r o f f, where all characters (including blank spaces) have different sizes, and,
as a result, you can’t just line things up by eye. If the numbers you want to align have
an uneven number of decimal positions, you can manually force right adjustment of
numeric data using the special escape sequence \ 0 , which will produce a blank space
exactly the same width as a digit. For example:
. t a liR
I I500.2\0
I -
I 125 3 5
I 150. \ O \ O

will produce:

As on a typewriter, if you have already spaced past a tab position (either by print-
ing characters, or with an indent or other horizontal motion), a tab in the input will push
text over to the next available tab stop. If you have passed the last tab stop, any tabs
present in the input will be ignored.
You must be in no-fill mode for tabs to work correctly. This is not just because
filling will override the effect of the tabs. Using .n f when specifying tabs is an
important rule of thumb; we’ll look at the reasoning behind it in Chapter 15.

Underlining
We haven’t yet described how to underline text, a primary type of emphasis in
n r o f f, which lacks the trof f ability to switch fonts for emphasis.
There are two underlining requests: .u l (underfine) and .c u (continuous
underline). The .u l request underlines only printable characters (the words, but not
the spaces), and .c u underlines the entire text string.
84 0 UNlX Text Processing 0

.
These requests are used just like ce. Without an argument, they underline the
text on the following input line. You can use a numeric argument to specify that more
than one line should be underlined.
Both of these requests produce italics instead of underlines in t r o f f . Although
there is a request, .u f , that allows you to reset the underline font to some other font
than italics,* there is no way to have these requests produce underlining even in
t rof f . (The m s and mm macro packages both include a mucro to do underlining in
t ro f f , but this uses an entirely different mechanism, which is not explained until
Chapter 15.)

Inserting Vertical Space


A s you have seen, a blank line in the input text results in a blank line in the output.
You can leave blank space on the page (for example, between the closing of a letter and
the signature) by inserting a number of blank lines in the input text.
However, particularly when you are entering formatting codes as you write, rather
than going back to code an existing file like our sample letter, it is often more con-
.
venient to specify the spacing with the sp request.
For example, you could type:
Sincerely,
.sp 3
Fred Caslon

In t r o f f , the .sp request is even more important, because t r o f f can space in


much finer increments.
For example, if we were formatting the letter with t rof f , a full space between
paragraphs would look like this:

I 1
In our conversation last Thursday, we discussed a documentation project that would
produce a user’s guide and reference manual for the Alcuin product. Yesterday, I
received the product demo and other materials that you sent me.

Going through a demo session gave me a better understanding of the product. I con-
fess to being amazed by Alcuin. Some people around here, looking over my
shoulder, were also astounded by the illuminated manuscript I produced with Alcuin.
One person, a student of calligraphy, was really impressed.
The output would probably look better if there was a smaller amount of space between
the lines. If we replace the line between the paragraphs with the request - sp - 5 ,
here is what we will get:

*This request is generally used when the document is being typeset in a font family other than Times
Roman. It might be used to set the “underline font” to Helvetica Italic, rather than the standard Italic.
nroff and troff a5

In our conversation last Thursday, we discussed a documentation project that would


produce a user’s guide and reference manual for the Alcuin product. Yesterday, I
received the product demo and other materials that you sent me.

Going through a demo session gave me a much better understanding of the product.
I confess to being amazed by Alcuin. Some people around here, looking over my
shoulder, were also astounded by the illuminated manuscript I produced with Alcuin.
One person, a student of calligraphy, was really impressed.
Although it may not yet be apparent how this will be useful, you can also space to an
absolute position on the page, by inserting a vertical bar before the distance. The fol-
lowing:
- s p 13i

will space down to a position 3 inches from the top of the page, rather than 3 inches
from the current position.
You can also use negative values with ordinary relative spacing requests. For
example:
.sp -3

will move back up the page three lines. Of course, when you use any of these requests,
you have to know what you are doing. I f you tell n r o f f or t r o f f to put one line
on top of another, that’s exactly what you’ll get. For example:
This is t h e first line.
.sp -2
This is the second line.
.
br
This is t h e third line.

will result in:

I I
I This is t h e second line.
This i s t h e flhrsd line.

Sure enough, the second line is printed above the first, but because we haven’t restored
the original position, the third line is then printed on top of the first.
When you make negative vertical motions, you should always make compensatory
positive motions, so that you end up at the correct position for future output. The previ-
ous example would have avoided disaster if it had been coded:
This is t h e first line.
.sp -2
This is t h e second line.
-
SP
This is t h e third line.
86 0 UNlX Text Processing 0

(Notice that you need to space down one less line than you have spaced up because, in
this case, printing the second line ?uses up? one of the spaces you went back on.)
These kind of vertical motions are generally used for line drawing (e-g., for draw-
ing boxes around tables), in which all of the text is output, and the fonnatter then goes
back up the page to draw in the lines. At this stage, it is unlikely that you will find an
immediate use for this capability. Nonetheless, we are sure that a creative person,
knowing that it is there, will find it just the right tool for a job. (We?ll show a few
creative uses of our own later.)
You probably aren?t surprised that a typesetter can go back up the page. But you
may wonder how a typewriter-like printer can go back up the page like this. The
answer is that it can?t. If you do any reverse line motions (and you do when you use
certain macros in the standard packages, or the t b l and e q n preprocessors), you
must pass the n r o f f output through a special filter program called col to get all of
the motions sorted out beforehand, so that the page will be printed in the desired order:
$ nroff files I col I lp

Double or Triple Spacing


Both nrof f and troff provide a request to produce double- or triple-spaced output
without individually adjusting the space between each line. For example:
.Is 2
Putting this at the top of the file produces double-spaced lines. An argument of 3 speci-
fies triple-spaced lines.

Page Transitions
If we want space at the top of our one-page letter, it is easy enough to insert the com-
mand:
- s p li

before the first line of the text. However, n r o f f and troff do not provide an
easy way of handling page transitions in multipage documents.
By default, n r o f f and t r o f f assume that the page length is 1 1 inches. How-
ever, neither program makes immediate use of this information. There is no default top
and bottom margin, so text output begins on the first line, and goes to the end of the
page.
The .b p (break page) request allows you to force a page break. If you do this,
the remainder of the current page will be filled with blank lines, and output will start
again at the top of the second page. If you care to test this, insert a .bp anywhere in
the text of our sample letter, then process the letter with n r o f f . If you save the
resulting output in a file:
$ nroff letter > 1etter.out
0 nroff and t r o f f 0 87

you will find that the text following the .bp begins on line 67 ( 1 1 inches at 6 lines per
inch equals 66 lines per page).
To automatically leave space at the top and bottom of each page, you need to use
the .wh (when) request. In nrof f and t r o f f parlance, this request sets a trap-a
position on the page at which a given macro will be executed.
You’ll notice that we said mucro, not request. There’s the rub. To use .wh,
you need to know how to define a macro. It doesn’t work with single requests.
There’s not all that much to defining macros, though. A macro is simply a
sequence of stored requests that can be executed all at once with a single command.
We’ll come back to this later, after we’ve looked at the process of macro definition.
For the moment, let’s assume that we’ve defined two macros, one containing the
commands that will handle the top margin, and another for the bottom margin. The
first macro will be called .TM, and the second .BM. (By convention, macros are
often given names consisting of uppercase letters, to distinguish them from the basic
n r o f f and t r o f f requests. However, this is a convention only, and one that is not
always followed.)
To set traps that will execute these macros, we would use the .wh request as fol-
lows:
.wh 0 TM
.wh -li BM
The first argument to .wh specifies the vertical position on the page at which to exe-
cute the macro. An argument of 0 always stands for the top of the page, and a nega-
tive value is always counted from the bottom of the page, as defined by the page length.
In its simplest form, the .TM macro need only contain the single request to space
down 1 inch, and - BM need only contain the single request to break to a new page. If
.wh allowed you to specify a single request rather than a macro, this would be
equivalent to:
.wh 0 .sp l i
.wh -1i .bp

With an 1 1-inch page length, this would result in an effective 9-inch text area, because
on every page, the formatter’s first act would be to space down 1 inch, and it would
break to a new page when it reached 1 inch from the bottom.
You might wonder why n r o f f and t r o f f have made the business of page
transition more complicated than any of the other essential page layout tasks. There are
two reasons:

The n r o f f and t r o f f programs were designed with the typesetting heri-


tage in mind. Until fairly recently, most typesetters produced continuous out-
put on rolls of photographic paper or film. This output was manually cut and
pasted up onto pages.
Especially in t r o f f , page transition is inherently more complex than the
other tasks we’ve described. For example, books often contain headers and
footers that are set in different type sizes or styles. At every page transition,
the software must automatically save information about the current type style,
aa 0 UNIX Text Processing 0

switch to the style used by the header or footer, and then revert to the original
style when it returns to the main text. Or consider the matter of footnotes-the
position at which the page ends is different when a footnote is on the page.
The page transition trap must make some allowance for this.

In short, what you might like the formatter to do during page transitions can vary. For
this reason, the developers of n r o f f and t r o f f have allowed users to define their
own macros for handling this area.
When you start out with n r o f f or t r o f f , we advise you to use one of the
ready-made macro packages, m s or mm. The standard macro package for UNIX sys-
tems based on System V is mm; the standard on Berkeley UNIX systems is m s .
Berkeley UNIX systems also support a third macro package called m e . In addition,
there are specialized macro packages for formatting viewgraphs, standard UNIX refer-
ence manual pages (man), and UNIX permuted indexes (mptx). Only the m s and
mm packages are described in this book. The macro packages have already taken into
account many of the complexities in page transition (and other advanced formatting
problems), and provide many capabilities that would take considerable time and effort
to design yourself.
Of course, it is quite possible to design your own macro package, and we will go
into all of the details later. (In fact, this book is coded with neither of the standard
macro packages, but with one developed by Steve Kochan and Pat Wood of Pipeline
Associates, the consulting editors of this series, for use specifically with the Hayden
UNIX library.)

Page Length Revisited


Before we take a closer look at macros, let’s take a moment to make a few more points
about page length, page breaks, and the like.
Assuming that some provision has been made for handling page transitions, there
are several wrinkles to the requests we have already introduced, plus several new
requests that you will probably find useful.
First, let’s talk about page length. It’s important to remember that the printing
area is defined by the interaction of the page length and the location of the traps you
define. For example, you could define a text area 7.5 inches high (as we did in prepar-
ing copy for this book) either by

changing the page length to 9.5 inches, and setting I-inch margins at the top
and bottom;
leaving the page length at 1 1 inches, and setting 1.75-inch margins at the top
and bottom.

In general, we prefer to think of .pl as setting the paper length, and use the page
transition traps to set larger or smaller margins.
However, there are cases where you really are working with a different paper size.
A good example of this is printing addresses on envelopes: the physical paper height is
about 4 inches (24 lines on a typewriter-like printer printing 6 lines per inch), and we
0 n r o f f and t r o f f 0 a9

want to print in a narrow window consisting of four or five lines. A good set of defini-
tions for this case would be:
-pl 4i
.wh 0 TM
.wh -9v BM
with .TM containing the request .s p 9v, and with .BM, as before, containing
.bp.
There is more to say about traps, but it will make more sense later, so we’ll leave
the subject for now.

Page Breaks without Line Breaks


Page breaks-we’ve talked about their use in page transition traps, but they also have a
common use on their own. Often, you will want to break a page before it would nor-
mally end. For example, if the page breaks right after the first line of a paragraph, you
will probably want to force the line onto the next page, rather than leaving an
“orphaned” line. Or you might want to leave blank space at the bottom of a page for
an illustration. To do this, simply enter a .bp at the desired point. A new page will
be started immediately.
However, consider the case in which you need to force a break in the middle of a
paragraph to prevent a “widowed” line at the top of the next page. If you do this:
The medieval masters of calligraphy and illumination
are largely unknown to us. We thankfully have examples
of their work, and even
- bP
marginal notes by the copyists of some manuscripts,
but the men who produced these minute masterpieces
are anonymous.
the .bp request will also cause a line break, and the text will not be filled properly:

The medieval masters of call graphy and illumination


are largely unknown to u s . We thankfully have examples
of their work, and even

New page begins here

marginal notes by the copyists of some manuscripts, but


the men who produced these minute masterpieces are
anonymous.

Fortunately, there is a way around this problem. If you begin a request with an apos-
trophe instead of a period, the request will not cause a break.
90 0 UNlX Text Processing 0

The medieval masters of calligraphy and illumination


are largely unknown t o us. W e thankfully have examples
of their work, and even
‘bP
marginal notes by t h e copyists of some manuscripts,
but t h e men who produced these minute masterpieces
are anonymous.

Now we have the desired result:

The medieval masters of calligraphy and illumination


are largely unknown t o us. We thankfully have examples

New page begins here

of their work, and even marginal notes by t h e copyists


of some manuscripts, but t h e men who produced these
minute masterpieces are anonymous.

(In fact, most page transition macros use this feature to make paragraphs continue
across page boundaries. We’ll take a closer look at this in later chapters.)
Another very useful request is the conditional page break, or .ne (need) request.
If you want to make sure an entire block of text appears on the same page, you can use
this request to force a page break if there isn’t enough space left. If there is sufficient
space, the request is ignored.
For example, the two requests:
.ne 3.2i
. s p 3i

might be used to reserve blank space to paste in an illustration that is 3 inches high.
The .n e request does not cause a break, so you should be sure to precede it with
. b r or another request that causes a break if you don’t want the remnants of the
current line buffer carried to the next page if the .ne is triggered.
It is often better to use .ne instead of .bp, unless you’re absolutely sure that
you will always want a page break at a particular point. If, in the course of editing, an
.ne request moves away from the bottom of the page, it will have no effect. But a
.bp will always start a new page, sometimes leaving a page nearly blank when the text
in a file has been changed significantly.
There are other special spacing requests that can be used for this purpose.
(Depending on the macro package, these may have to be used.) For example, .s v
(save space) requests a block of contiguous space. If the remainder of the page does
not contain the requested amount of space, no space is output. Instead, the amount of
space requested is remembered and is output when an .os (output saved space)
request is encountered.
These are advanced requests, but you may need to know about them because most
macro packages include two other spacing requests in their page transition macros:
.ns (no space) and .rs (restore space). An .n s inhibits the effect of spacing
requests; .r s restores the effectiveness of such requests.
0 n r o f f and troff 0 91

Both the m s and mm macros include an .ns request in their page transition
macros. As a result, if you issue a request like:
.sp 3i

with 1 inch remaining before the bottom of the page, you will not get 1 inch at the bot-
tom, plus 2 inches at the top of the next page, but only whatever remains at the bottom.
The next page will start right at the top. However, both macro packages also include an
- o s request in their page top macro, so if you truly want 3 inches, use .s v 3 i,and
you will get the expected result.
However, if you use .s v , you will also have another unexpected result: text
following the spacing request will “float” ahead of it to fill up the remainder of the
current page.
We’ll talk more about this later. We introduced it now to prevent confusion when
spacing requests don’t always act the way you expect.

Page Numbering
The nrof f and t rof f programs keep track of page numbers and make the current
page number available to be printed out (usually by a page transition macro). You can
.
artificially set the page number with the pn request:

.pn 5 Set the current page number to 5


- p n +5 Increment the current page number by 5
.pn -5 Decrement the current page number by 5

You can also artificially set the number for the nexf page whenever you issue a .bp
request, simply by adding a numeric argument:

.bp 5 Break the page and set the next page number to 5
.bp + 5 Break the page and increment the next page number by 5
.bp -5 Break the page and decrement the next page number by 5

In addition to inhibiting .sp, the -


ns request inhibits the action of bp, unfess a
page number is specified. This means (at least in the existing macro packages), that the
sequence:
- bP
- bP
will not result in a blank page being output. You will get the same effect as if you had
-
specified only a simple bp. Instead, you should specify:
.bp +1

The starting page number (usually 1) can also be set from the command line, using the
-n option. For example:
92 0 UNlX Text Processing 0

$ n r o f f -ms -n10 file

will start numbering file at page number 10. In addition, there is a command-line
option to print only selected pages of the output. The -0 option takes a list of page
numbers as its argument. The entire file (up to the last page number in the list) is pro-
cessed, but only the specified pages are output. The list can include single pages
separated by commas, or a range of pages separated by a hyphen, or both. A number
followed by a trailing hyphen means to output from that page to the end. For example:
$ n r o f f -ms -01,5,7-9,13- file
will output pages 1, 5, 7 through 9, and from 13 to the end o f the file. There should be
no spaces anywhere in the list.

= Changing Fonts
In old t rof f (otrof f), you were limited to four fonts at a time, because the fonts
had to be physically mounted on the C/A/T typesetter. With ditrof f and a laser
printer or a modem typesetter, you can use a virtually unlimited number of fonts in the
same document.
In o t rof f you needed to specify the basic fonts that are in use with the - fp
(font position) request. Normally, at the front o f a file (or, more likely, in the macro
package), you would use this request to specify which fonts are mounted in each of the
four quadrants (positions) of the typesetter wheel. B y default, the roman font is
mounted in position 1 , the italic font in position 2, the bold font in position 3, and the
special font in position 4. That is, t rof f acts as though you had included the lines:
-fp 1 R
-fp 2 I
.fp 3 B
.fp 4 s

In dit rof f, up to ten fonts are automatically mounted, with the special font in posi-
tion 10. Which fonts are mounted, and in which positions, depends on the output dev-
ice. See Appendix D for details. The font that is mounted in position 1 will be used
for the body type of the text-it is the font that will be used if no other specification is
given. The special font is also used without any intervention on your part when a char-
acter not in the normal character set is requested.
To request one of the other fonts, you can use either the .ft request, or the
inline font-switch escape sequence \ f.
For example:
.ft €3

This line will be set i n b o l d t y p e .


- br
.ft R
This line will again be set in roman t y p e .

will produce:
0 n r o f f and t r o f f 0 93

I I
I This line will be set in bold type.
This line will again be set in roman type.
You can also change fonts using an inline font escape sequence. For example, the
preceding sentence was coded like this:
...a n i n l i n e f o n t \fIescape sequence\fP.

You may wonder at the \ fP at the end, rather than \ fR. The P command is a spe-
cial code that can be used with either the .ft request or the \ f escape sequence. It
means “return to the previous font, whatever it was.” This is often preferable to an
explicit font request, because it i s more general.
All of this begs the question of fonts different than Times Roman, Bold, and
Italic. There are two issues: first, which fonts are available on the output device, and
second, which fonts does t roff have width tables for. (As described previously,
troff uses these tables to determine how far to space over after it outputs each char-
acter.) For otroff these width tables are in the directory /usr/lib/font, in
files whose names begin with ft. If you list the contents of this directory, you might
see something like this for o t KO ff:
$ 1s /usr/lib/font
ftB ftBC ftC ftCE ftCI
ftCK ftCS ftCW ftFD ftG
ftGI ftGM ftGR ftH ftHB
ftHI ftI ftL ftLI ftPA
ftPB f t P I ftR f t S ftSB
ftSI ftSM ftUD

You can pick out the familiar R, I, B, and S fonts, and may guess that ftH, ftHI,
and ftHB refer to Helvetica, Helvetica Italic, and Helvetica Bold fonts. However,
unless you are familiar with typesetting, the other names might as well be Greek to you.
In any event, these width tables, normally supplied with troff,are for fonts that are
commonly used with the C/A/T typesetter. If you are using a different device, they may
be of no use to you.
The point is that if you are using a different typesetting device, you will need to
get information about the font names for your system from whoever set up the equip-
ment to work with troff. The contents of /usr/lib/font will vary from
installation to installation, depending on what fonts are supported.
For ditroff,there is a separate subdirectory in /usr/lib/font for each
supported output device. For example:
$ 1s /usr/lib/font
devl j devps
$ 1s /usr/lib/font/devps
B.out BI.out CB.out CI.out CW.out CX.out
DESC.out H . o u t HB o u t . HI. out HK. o u t HO. o u t
HX.out I.out L I out . PA. o u t .
PB o u t PI. out
PX-out R.out 0. o u t RS.out S.out s1. o u t
94 0 UNlX Text Processing 0

Here, the font name is followed by the string .out.


Again, the font names themselves are probably Greek to you. However, with
ditrof f, you can actually use any of these names, and see what results they give
you, because all fonts should be available at any time.
For the sake of argument, let's assume that your typesetter or other t r o f f -
compatible equipment supports the Helvetica font family shown in Figure 4-3, with the
names H, H I , and HB. (This is a fairly reasonable assumption, because Helvetica is
probably the most widely available font family after Times.)

Helvetica
abcdefghijklrnnopqrstuvwxyz
ABCDEFGHIJKLMNOPQRSTUVWXYZ
1234567890
!$%&a()"*+-.,/:;=?[]I
0 - - - l/4 ' / z 3/4 fi fl "t'80
Helvetica Italic
abcdefghijklrnnopqrstuvwxyz
ABCDEFGHIJKLMNOPQRS TUVWXYZ
1234567890
!$ % & () " + - . ,/: ;= ? [ ] I
0 -- - '/4 1' 3/4
'2 fi fl 't '80

Helvetica Bold
abcdefghijklmnopqrstuvwxyz
ABCDEFGHIJKLMNOPQRSTUVWXYZ
1234567890
!$%&()"*+-.,/:;=?[]I
0 -- - l/4 112 314 fi fl " t * @ @

Fig. 4-3. Helvetica Fonts

When specifying two-character font names with the \f escape sequence, you
must add the ( prefix as well. For example, you would specify Helvetica Italic by the
inline sequence \ f ( H I , and Helvetica Bold by \ f (HB.
0 n r o f f and t r o f f 0 95

There is another issue when you are using fonts other than the Times Roman fam-
ily. Assume that you decide to typeset your document in Helvetica rather than Roman.
You reset your initial font position settings to read:
-fp 1 H
.fp 2 H I
. f p 3 HB
.fp 4 s

However, throughout the text, you have requests of the form:


.ft B

or:
\fB

You will need to make a set of global replacements throughout your file. To insulate
yourself in a broader way from overall font change decisions, t rof f allows you to
specify fonts by position, even within .f t and \f requests:

.ft 1 or \fl Use the font mounted in position 1


.ft 2 or \f2 Use the font mounted in position 2
.ft 3 or \f3 Use the font mounted in position 3
.ft 4 or \f4 Use the font mounted in position 4

Because you don’t need to use the .f p request to set font positions with d i t K O f f,
and the range of fonts is much greater, you may have a problem knowing which fonts
are mounted in which positions. A quick way to find out which fonts are mounted is to
run d i t r o f f on a short file, sending the output to the screen. For example:
$ ditroff -Tps junk 1 more
x T ps
x res 7 2 0 1 1
x init
x font 1 R
x font 2 I
x font 3 B
x font 4 BI
x font 5 CW
x font 6 CB
x font 7 H
x font 8 HB
x font 9 HI
x font 10 S
.-.
The font positions should appear at the top of the file. In this example, you see the fol-
lowing fonts: (Times) Roman, (Times) Bold, (Times) Italic, (Times) Bold Italic, Con-
stant Width, Constant Bold, Helvetica, Helvetica Bold, Helvetica Italic, and Special.
Which font is mounted in which position is controlled by the file DESC. o u t in the
device subdirectory of / u s r / l i b / f o n t . See Appendix D for details.
I
96 0 UNlX Text Processing 0

Special Characters
A variety of special characters that are not part of the standard ASCII character set are
supported by n r o f f and t r o f f . These include Greek letters, mathematical sym-
bols, and graphic characters. Some of these characters are part of the font referred to
earlier as the special font. Others are part of the standard typesetter fonts.
Regardless of the font in which they are contained, special characters are included
in a file by means of special four-character escape sequences beginning with \ (.
Appendix B gives a complete list of special characters. However, some of the
most useful are listed in Table 4-4, because even as a beginner you may want to include
them in your text. Although n r o f f makes a valiant effort to produce some of these
characters, they are really best suited for t r o f f .

TABLE 4-4. Special Characters

Name Escape Sequence Output Character


em dash \(em -
bullet \(bu 0

square \(sq 0
baseline rule \(nJ -
underrule \(ul -
114 \( 14 ’14
112 \( 12 ‘12

314 \( 34 v4
degrees \(de 0

dagger \(dg t
double dagger \(dd $
registered mark kg 8
copyright symbol \(co 0
section mark \( sc 0
square root \(sq .I
greater than or equal \(>= 2
less than or equal \( <= I
not equal \(!= #
multiply \(mu X
divide \(di -
plus or minus \(+- k
right arrow \( -> +
left arrow \(<- t
up arrow \( ua T
down arrow \(da J8

We’ll talk more about some of these special characters as we use them. Some are
used internally by eqn for producing mathematical equations. The use of symbols
such as the copyright, registered trademark, and dagger is fairly obvious.
0 n r o f f and t r o f f 0 97

However, you shouldn’t limit yourself to the obvious. Many of these special
characters can be put to innovative use. For example, the square root symbol can be
used to simulate a check mark, and the square can become an alternate type of bullet.
As we’ll show in Chapter 15, you can create additional, effective character combina-
tions, such as a checkmark in a box, with overstriking.
The point i s to add these symbols to your repertoire, where they can wait until
need and imagination provide a use for them.

Type Size Specification


Typesetting also allows for different overall sizes of characters. Typesetting character
sizes are described by units called points. A point is approximately 1/72 of an inch.
Typical type sizes range from 6 to 72 points. A few different sizes follow:

This line 1s set in 6-point type.

This line is set in 8-point type.

This line is set in 10-point type.


This line i s set in 12-point type.
This line is set in 14-point type.
This line is set in 18-point type.
(The exact size of a typeface does not always match its official size designation.. For
example, 12-point type is not always 1/6 inch high, nor is 72-point type 1 inch high.
The precise size will vary with the typeface.)
As with font changes, there are two ways to make size changes: with a request
and with an inline escape sequence. The .ps request sets the point size. For exam-
ple:

.ps 10 Set the point size to 10 points

A - p s request that does not specify any point size reverts to the previous point size
setting, whatever it was:

.ps 1 0

Some text here

- PS Revert to the point size before we changed it

To switch point size in the middle of the line, use the \ s escape sequence. For exam-
ple, many books reduce the point size when they print the word UNIX in the middle of a
line. The preceding sentence was produced by these input lines:
98 0 UNlX Text Processing 0

F o r example, many books reduce t h e point size when


they print t h e word \s8UNIX\sO in t h e middle of a line.

As you can probably guess from the example, \ S O does not mean to use a point size
of 0, but to revert to the previous size.
In addition, you can use relative values when specifying point sizes. Knowing
that the body of the book is set in 10-point type, we could have achieved the same
result by entering:
For example, many books reduce t h e point size when
they print t h e word \s-2UNIX\sO in t h e middle of a line.

You can increment or decrement point sizes only using a single digit; that is, you can’t
increment or decrement the size by more than 9 points.
Only certain sizes may be available on the typesetter. (Legal point sizes in
o t r o f f are 6, 7 , 8, 9, 10, 1 1 , 12, 14, 16, 18, 20, 22, 24, 28, and 36. Legal point sizes
in d i t r o f f depend upon the output device, but there will generally be more sizes
available.) If you request a point size between two legal sizes, o t r o f f will round up
to the next legal point size; d i t r o f f will round to the nearest available size.

Vertical Spacing
In addition to its ability to change typefaces and type sizes on the same page, a
typesetter allows you to change the amount o f vertical space between lines. This spac-
ing is sometimes referred to as the baseline spacing because it is the distance between
the base o f characters on successive lines. (The difference between the point size and
the baseline spacing is referred to as leading, from the old days when a human compo-
sitor inserted thin strips of lead between successive lines of type.)
A typewriter or typewriter-style printer usually spaces vertically in 1/6-inch incre-
ments (Le.. 6 lines per inch). A typesetter usually adjusts the space according to the
point size. For example, the type samples shown previously were all set with 20 points
of vertical space. More typically, the vertical space will vary along with the type size,
like this:

This line i s set in 6-point type and 8-point spacing.


is set in 8-point type and IO-point spacing.
This line
This line is set in 10-point type and 12-point spacing.
This line is set in 12-point type and 14-point spacing.
This line is set in 14-point type and 16-point spacing.
1 This line is set in 18-point type and 20-poi
Typically, the body o f a book is set with a single size o f type (usually 9 or 10 point,
with vertical spacing set to 1 1 or 12 points, respectively). Larger sizes are used occa-
sionally for emphasis, for example, in chapter or section headings. When the type size
is changed, the vertical spacing needs to be changed too, or the type will overrun the
previous line, as follows, where 14-point type i s shown with only 10-point spacing.
0 nroff and t r o f f 0 99

I Here is typ larg r than


f c f
the space a lotte for it.
Vertical spacing is changed with the .vs request. A vertical space request will
typically be paired with a point size request:
-ps 1 0
.vs 1 2
After you set the vertical spacing with .vs,this becomes the basis of the v unit for
t rof f . For example, if you enter .v s 1 2 , the request .s p will space down 12
points; the request:
.sp 0.5v

will space down 6 points, or half the current vertical line spacing. However, if you
change the baseline vertical spacing to 16, the .s p request will space down 16 points.
Spacing specified in any other units will be unaffected. What all this adds up to is the
commonsense observation that a blank line takes up the same amount of space as one
containing text.
When you use double and triple spacing, it applies a multiplication factor to the
baseline spacing. The request - 1 s 2 will double the baseline spacing. You can
specify any multiplication factor you like, though 2 and 3 are the most reasonable
values.
The - 1 s request will only affect the spacing between output lines of text. It
does not change the definition of v or affect vertical spacing requests.

A First Look at Macros =

Although we won’t go into all the details of macro design until we have discussed the
existing macro packages in the next two chapters, we’ll cover some of the basic con-
cepts here. This will help you understand what the macro packages are doing and how
they work.
To define a macro, you use the . d e request, followed by the sequence of
requests that you want to execute when the macro is invoked. The macro definition is
terminated by the request . . (two dots). The name to be assigned to the macro is
given as an argument to the .de request.
You should consider defining a macro whenever you find yourself issuing a
repetitive sequence of requests. If you are not using one of the existing macro packages
(which have already taken care of this kind of thing), paragraphing is a good example of
the kind of formatting that lends itself to macros.
Although it is certainly adequate to separate paragraphs simply by a blank line,
you might instead want to separate them with a blank line and a temporary indent.
What’s more, to prevent “orphaned” lines, you would like to be sure that at least two
lines of each paragraph appear at the bottom of the page. So you might define the fol-
lowing macro:
100 0 UNlX Text Processing 0

.de P
- SP
.ne 2
.ti 5n

This is the simplest kind of macr-a straightforward sequence of stored commands.


However, macros can take arguments, take different actions depending on the presence
or absence of various conditions, and do many other interesting and wonderful things.
We'll taik more about the enormous range of potential in macros in later chapters.
For the moment, let's just consider one or two points that you will need to understand
in order to use the existing macro packages.

Macro Arguments
Most basic t r o f f requests take simple arguments-single characters or letters. Many
macros take more complex arguments, such as character strings. There are a few simple
pointers you need to keep in mind through the discussion of macro packages in the next
two chapters.
First, a space is taken by default as the separator between arguments. If a single
macro argument is a string that contains spaces, you need to quote the entire string to
keep it from being treated as a series of separate arguments.
For example, imagine a macro to print the title of a chapter in this book. The
macro call looks like this:
.CH 4 "Nroff and Troff"

A second point: to skip an argument that you want to ignore, supply a null string ("").
For example:
.CH " " '' P re face"

As you can see, it does no harm to quote a string argument that doesn't contain spaces
( " P r e f a c e " ) , and it is probably a good habit to quote all strings.

Number Registers
When you use a specific value in a macro definition, you are limited to that value when
you use the macro. For example, in the paragraph macro definition shown previously,
the space will always be 1, and the indent always 5n.
However, n r o f f and t r o f f allow you to save numeric values in special vari-
ables known as number registers. If you use the value of a register in a macro defini-
tion, the action of the macro can be changed just by placing a new value in the register.
For example, in m s , the size of the top and bottom margins i s not specified with an
absolute value, but with a number register. As a result, you don't need to change the
macro definition to change these margins; you simply reset the value of the appropriate
number register. Just as importantly, the contents of number registers can be used as
flugs (a kind of message between macros). There are conditional statements in the
markup language of n r o f f and t r o f f , so that a macro can say: "If number register
0 nroff and troff 0 101

Y has the value x, then do thus-and-so. Otherwise, do this.? For example, in the mm
macros, hyphenation is turned off by default. To turn it on, you set the value of a cer-
tain number register to 1. Various macros test the value of this register, and use it as a
signal to re-enable hyphenation.
To store a value into a number register, use the .n r request. This request takes
two arguments: the name of a number register,* and the value to be placed into it.
For example, in the m s macros, the size of the top and bottom margins is stored
in the registers HM (header margin) and F M (footer margin). To reset these margins
from their default value of 1 inch to 1.75 inches (thus producing a shorter page like the
one used in this book), all you would need to do is to issue the requests:
.nr HM 1 . 7 5 i
.nr F M 1 . 7 5 i

You can also set number registers with single-character names from the command line
by using the -r option. (The mm macros make heavy use of this capability.) For
example:
$ ntoff -nrm -rN1 file
will formatfile using the m macros, with number register N set to the value 1 . We
will talk more about using number registers later, when we describe how to write your
own macros. For the moment, all you need to know i s how to put new values into
existing registers. The next two chapters will describe the particular number registers
that you may find useful with the mm and m s macro packages.

Predefined Strings
The m m and m s macro packages also make use of some predefined text strings. The
n r o f f and t r o f f programs allow you to associate a text string with a one- or two-
character string name. When the formatter encounters a special escape sequence includ-
ing the string name, the complete string is substituted in the output.
To define a string, use the . d s request. This request takes two arguments, the
string name and the string itself. For example:
. d s nt N r o f f and T r o f f

The string should not be quoted. It can optionally begin with a quotation mark, but it
should not end with one, or the concluding quotation mark will appear in the output. If
you want to start a string with one or more blank spaces, though, you should begin the
definition with a quotation mark. Even in this case, there is no concluding quotation
mark. A s always, the string is terminated by a newline.

*Number register names can consist of either one or two characters, just like macro names. However, they
are distinct-that is, a number register and a macro can be given the same name without conflict.
102 0 UNIX Text Processing 0

You can define a multiline string by hiding the newlines with a backslash. For
example:
.ds LS This is a very long string that goes over \
more than one line.

When the string is interpolated, it will be subject to filling (unless no-fill mode is in
effect) and may not be broken into lines at the same points as you’ve specified in the
definition. To interpolate the string in the output, you use one of the following escape
sequences:
\*a
\ * (ab
where a is a one-character string name, and ab is a two-character string name.
To use the nr string we defined earlier, you would type:
\ * (nt
It would be replaced in the output by the words Nroff and Troff.
Strings use the same pool of names as macros. Defining a string with the same
name as an existing macro will make the macro inoperable, so it is not advisable to go
around wildly defining shorthand strings. The v i editor’s abbreviation facility
(described in Chapter 7) i s a more effective way to save yourself work typing.
Strings are useful in macro design in much the same way number registers are-
they allow a macro to be defined in a more general way. For example, consider this
book, which prints the title of the chapter in the header on each odd-numbered page.
The chapter title is not coded into the page top macro. Instead, a predefined string is
interpolated there. The same macro that describes the format of the chapter title on the
first page of the chapter also defines the string that will appear in the header.
In using each of the existing macro packages, you may be asked to define or
interpolate the contents of an existing string. For the most part, though, string defini-
tions are hidden inside macro definitions, so you may not run across them. However,
there are a couple of handy predefined strings you may find yourself using, such as:
\ * (DY
which always contains the current date in the m s macro package. (The equivalent
string in mm is \ * (DT.) For example, if you wanted a form letter to contain the date
that it was formatted and printed rather than the date it was written, you could interpo-
late this string.

Just What Is a Macro Package?


Before leaving the topic of macros, we ought to take a moment to treat a subject we
have skirted up to this point: just what is a macro package?
As the name suggests, a macro package is simply a collection of macro defini-
tions. The fact that there are command-line options for using the existing packages may
seem to give them a special status, but they are text files that you can read and modify
(assuming that your system has the UNIX file permissions set up so you can do so).
0 nroffandtroff 0 103

There is no magic to the options -ms and -mm. The actual option to n r o f f
and troff is -mx, which tells the program to look in the directory
/usr/lib/tmac for a file with a name of the form tmac .x. As you might expect,
this means that there is a file in that directory called tmac .s or tmac .m (depending
on which package you have on your system). It also means that you can invoke a
macro package of your own from the command line simply by storing the macro defini-
tions in a file with the appropriate pathname. This file will be added to any other files
in the formatting run. This means that if you are using the ms macros you could
achieve the same result by including the line:
.so /usr/lib/tmac/tmac.s

at the start of each source file, and omitting the command-line switch -ms. (The .so
request reads another file into the input stream, and when its contents have been
exhausted, returns to the current file. Multiple .s o requests can be nested, not just to
read in macro definitions, but also to read in additional text files.)
The macros in the standard macro packages are no different (other than in com-
plexity) than the macros you might write yourself. In fact, you can print out and study
the contents of the existing macro packages to learn how they work. We’ll be looking
in detail at the actions of the existing macro packages, but for copyright reasons we
can’t actually show their internal design. We’ll come back to all this later. For now,
all you need to know is that macros aren’t magic-just an assemblage of simple com-
mands working together.
C H A P T E R
rn 8 rn

Thems Macros

The UNIX shell i s a user interface for the kernel, the actual heart of the operating sys-
tem. You can choose the C shell or Korn shell instead of the Bourne shell, without
worrying about its effects on the low-level operations of the kernel. Likewise, a macro
package is a user interface for accessing the capabilities of the n r o f f / t r o f f for-
matter. Users can select either the m s or m macro packages (as well as other pack-
ages that are available on some systems) to use with n r o f f/tr o f f .
The m s package was the original Bell Labs macro package, and is available on
many UNIX systems, but it is no longer officially supported by AT&T. Our main rea-
son for giving m s equal time is that many Berkeley UNIX systems ship m s instead of
mm. In addition, it is a less complex package, so it is much easier to learn the principles
of macro design by studying m s than by studying mm.
A third general-purpose package, called m e , is also distributed with Berkeley
UNIX systems. It was written by Eric Allman and is comparable to m s and m.
(Mark Horton writes us: I think of m s as the FORTRAN of n r o f f, mm as the PL/I,
and m e as the Pascal.) The m e package i s not described in this book.
In addition, there are specialized packages-mv, for formatting viewgraphs,
m p t x , for formatting the permuted index found in the UNIX Reference Manual, and
man, for formatting the reference pages in that same manual. These packages are sim-
ple and are covered in the standard UNIX documentation.
Regardless of which macro package you choose, the formatter knows only to
replace each call of a macro with its definition. The macro definition contains the set of
requests that the formatter executes. Whether a definition is supplied with the text in
the input file or found in a macro package i s irrelevant to n r o f f l t r o f f. The for-
matter can be said to be oblivious to the idea of a macro package.
You might not expect this rather freely structured arrangement between a macro
package and n r o f f/tr o f f . Macros are application programs of sorts. They organ-
ize the types of functions that you need to be able to do. However, the actual work is
accomplished by n r o f fjtrof f requests.
In other words, the basic formatting capabilities are inherent in n r o f f and
t r o f f ; the user implementation of these capabilities to achieve particular formats is

104
0 The m s Macros 0 105

accomplished with a macro package. If a macro doesn’t work the way you expect, its
definition may have been modified. It doesn’t mean that n r o f f / t r o f f works dif-
ferently on your system. It is one thing to say “nrof f / t r o f f won’t let me do it,”
and another to say “I don’t have the macro to do it (but I could do it, perhaps).”
A general-purpose macro package like m s provides a way of describing the for-
mat of various kinds of documents. Each document presents its own specific problems,
and macros help to provide a simple and flexible solution. The m s macro package is
designed to help you format letters, proposals, memos, technical papers, and reports.
For simple documents such as letters, m s offers few advantages to the basic for-
mat requests described in Chapter 4. But as you begin to format more complex docu-
ments, you will quickly see the advantage of working with a macro package, which pro-
vides specialized tools for so many of the formatting tasks you will encounter.
A text file that contains m s macros can be processed by either n r o f f or
t r o f f , and the output can be displayed on a terminal screen or printed on a line
printer, a laser printer, or a typesetter.

Formatting a Text File with ms


If you want to format an m s document for a line printer or for a terminal screen, enter
this command line:
$ nroff -ms file(s)
To format for a laser printer or typesetter, enter this command line:
$ troff -ms file(s) I devicepostprocessor
Be sure to redirect the output to a file or pipe it to the printer; if you do not, the output
will be sent to your terminal screen.

Problems in Getting Formatted Output


There are two ways for a program to handle errors. One is to have the program ter-
minate and issue an error message. The other way is to have it keep going in hopes that
the problems won’t affect the rest of the output. The m s macros take this second
approach.
In general, m s does its best to carry on no matter how scrambled the output
looks. Sometimes the problems do get corrected within a page or two; other times the
problem continues, making the remaining pages worthless. Usually, this is because the
formatter had a problem executing the codes as they were entered in the input file.
Most of the time input errors are caused by not including one of the macros that must
be used in pairs.
Because m s allows formatting to continue unless the error is a “fatal” one, error
correction is characteristic of the m s macro definitions. Apart from the main function
of the macro, some of them, such as the paragraph macro, also invoke another macro
called .RT to restore certain default values.
106 0 UNlX Text Processing 0

Thus, if you forget to reset the point size or indentation, you might notice that the
problem continues for a while and then stops.

PageLayout
A s suggested in the last chapter, one of the most important functions of a macro pack-
age is that it provides basic page layout defaults. This feature makes it worthwhile to
use a macro package even if you don’t enter a single macro into your source file.
At the beginning of Chapter 4, we showed how n r o f f alone formatted a sample
letter. If we format the same letter with m s , the text will be adjusted on a page that
has a default top and bottom margin of 1 inch, a default left margin, or page offset, of
about 1 inch, and a default line length of 6 inches.
All of these default values are stored in number registers so that you can easily
change them:

LL Line Length
HM Header (top) Margin
FM Footer (bottom) Margin
PO Page offset (left margin)

For example, if you like larger top and bottom margins, all you need to do is
insert the following requests at the top of your file:
.nr HM 1.5i
.nr F M 1.5i

Registers such as these are used internally by a number of m s macros to reset the
formatter to its default state. They will not take effect until one of those “reset” mac-
ros is encountered. In the case of HM and FM, they will not take effect until the next
page unless they are specified at the very beginning of the file.*

Paragraphs
A s we saw in the last chapter, paragraph transitions are natural candidates for macros
because each paragraph generally will require several requests (spacing, indentation,) for
proper formatting.
There are four paragraph macros in m s :

*These “reset” macros (those that call the internal macro .RT) include .LP, .PP, .IP, -QP,
.
.SH, NH, .RS, . R E , .T S , and .TE. The very first met macro calk a special initialization
macro called .B G that is used only once, on the first page. This macro prints the cover sheet, if any (see
“Cover Sheet Macros” later in this chapter), as well as performing some special first-page initialization.
0 The m s Macros 0 107

.LP Block paragraph


.PP First line of paragraph indented
- QP Paragraph indented from both margins
.I P Paragraph with hanging indent (list item)

The LP macro produces a justified, block paragraph. This is the type of para-
graph used for most technical documentation. The . P P macro produces a paragraph
with a temporary indent for the first line. This paragraph type is commonly used in
published books and magazines, as well as in typewritten correspondence.
Let’s use the same letter to illustrate the use of these macros. In the original
example (in Chapter 4), we left blank lines between paragraphs, producing an effect
similar to that produced by the .LP macro.
In contrast, . P P produces a standard indented paragraph. Let’s code the letter
using . P P macros. Because this is a letter, let’s also disable justification with an
.na request. And of course, we want to print the address block in no-fill mode, as
shown in Chapter 4. Figure 5-1 shows the coded letter and Figure 5-2 shows the for-
matted output.

Spacing between Paragraphs


With n r o f f , all of the paragraph macros produce a full space between paragraphs.
However, with t rof f , the paragraph macros output a blank space of 0 . 3 ~ Basically,
.
this means that a blank line will output one full space and the paragraph macros will
output about a third of that space.
The amount of spacing between paragraphs is contained in the number register
PD (paragraph distance). If you want to change the amount of space generated by any
of the paragraph macros, simply change the contents of this register.
For example, if you don’t want to leave any space between paragraphs in the
letter, you could put the following line at the start of your file:
. n r PD 0
This flexibility afforded by macro packages is a major advantage. It is often possible to
completely change the appearance of a coded document by resetting only a few number
registers at the start of a file. (As we’ll see, this statement is even more true of of mm
than of ms.)

Quoted Paragraphs
A paragraph that is indented equally from the left and right margins is typically used to
display quoted material. It is produced by .QP. For example:
- QP
In t h e next couple of d a y s , I’ll be putting together a .__
108 0 UNlX Text Processing 0

.ad r
April 1, 1987
.sp 2
.ad
.nf
M r . John Fust
Vice President, Research and Development
Gutenberg Galaxy Software
Waltham, Massachusetts 02159
.fi
- SP
.na
Dear M r . Fust:
.PP
In our conversation last Thursday, we discussed a documentation
project that would produce a user's manual on the Alcuin
product. Yesterday, I received the product demo and other
materials that you sent me.
.PP
Going through a demo session gave me a much better understanding
of the product. I confess to being amazed by Alcuin.
Some people around here, looking over my shoulder, were also
astounded by the illustrated manuscript I produced with Alcuin.
One person, a student of calligraphy, was really impressed.
* PP

In the next couple of days, I'll be putting together a written


plan that presents different strategies for documenting the
Alcuin product. After I submit this plan, and you have had time
to review it, let's arrange a meeting at your company to discuss
these strategies.
.PP
Thanks again for giving us the opportunity to bid on this
documentation project. I hope we can decide upon a strategy
and get started as soon as possible in order to have the manual
ready in time for the first customer shipment. I look forward to
meeting with you towards the end of next week.
- SP
S incerely,
.sp 3
Fred Caslon

Fig. 5-1. Letter Coded with ms Macros


0 Thems Macros 109

April 1, 1987

Mr. John Fust


Vice President, Research and Development
Gutenberg Galaxy Software
Waltham, Massachusetts 02159
Dear Mr. Fust:
In our conversation last Thursday, we discussed
a documentation project that would produce a user's
manual on the Alcuin product. Yesterday, I received
the product demo and other materials that you sent
me -
Going through a demo session gave me a much
better understanding of the product. I confess to
being amazed by Alcuin. Some people around here,
looking over my shoulder, were also astounded by the
illustrated manuscript I produced with Alcuin. One
person, a student of calligraphy, was really
impressed.
In the next couple of days, I'll be putting
together a written plan that presents different
strategies for documenting the Alcuin product. After
I submit this plan, and you have had time to review
it, let's arrange a meeting at your company to dis-
cuss these strategies.
Thanks again for giving us the opportunity to
bid on this documentation project. I hope we can
decide upon a strategy and get started as soon as
possible in order to have the manual ready in time
for the first customer shipment. I look forward to
meeting with you towards the end of next week.
Sincerely,

Fred Caslon

Fig. 5-2. Formatted Output


110 0 UNlX Text Processing 0

The .QP macro produces a paragraph indented on both sides. The pair of macros
.QS and .QE can be used to mark a section longer than one paragraph that is
indented. This is useful in reports and proposals that quote at length from another
source.
- LP
I w a s p a r t i c u l a r l y i n t e r e s t e d i n t h e f o l l o w i n g comment
found i n t h e product s p e c i f i c a t i o n :
- QS
Users f i r s t n e e d a b r i e f i n t r o d u c t i o n t o what
t h e p r o d u c t does. Sometimes t h i s i s m o r e f o r t h e
b e n e f i t o f p e o p l e who h a v e n ' t y e t b o u g h t t h e
p r o d u c t , and a r e j u s t l o o k i n g a t t h e manual.
However, i t a l s o serves t o p u t t h e r e s t o f t h e
manual, a n d t h e p r o d u c t i t s e l f , i n
t h e proper c o n t e x t .
- QE
The result of formatting is:

I was p a r t i c u l a r l y i n t e r e s t e d i n t h e f o l l o w i n g comment
found i n t h e product s p e c i f i c a t i o n :

U s e r s f i r s t n e e d a b r i e f i n t r o d u c t i o n t o what t h e
p r o d u c t does. S o m e t i m e s t h i s i s more for t h e b e n e -
f i t o f p e o p l e who h a v e n ' t yet bought t h e product,
and a r e j u s t l o o k i n g a t t h e manual. However, it
a l s o serves t o p u t t h e r e s t o f t h e m a n u a l , a n d t h e
p r o d u c t i t s e l f , i n t h e proper c o n t e x t .

Use the .QP macro inside a .QS!. QE block to break up paragraphs.


Indented Paragraphs
The I P macro produces an entire paragraph indented from the left margin. This is
especially useful for constructing lists, in which a mark of some kind (e.g., a letter or
number) extends into the left margin. We call these labeled item lists.
The .IP macro takes three arguments. The first argument i s a text label; if the
label contains spaces, it should be enclosed within quotation marks. The second argu-
ment i s optional and specifies the amount of indentation; a default of 5 is used if the
second argument is not specified. A third argument of 0 inhibits spacing before the
indented paragraph.
Item lists are useful in preparing command reference pages that describe various
syntax items, and in glossaries that present a term in one column and its definition in
the other. The following example shows a portion of the input file for a reference page:
0 The m s Macros 0 111

.IP figure 10
is the name of a cataloged figure. If
a figure has not been cataloged, you need to use
the LOCATE command.
.IP f:p 1 0
is the scale of the
figure in relation to the page.
.IP font 10
is the two-character abbreviation or
full name of one of the available fonts
from the Alcuin library.
The following item list is produced:

I I
figure is the name of a cataloged figure. If a figure
has not been cataloged, you need to use the
LOCATE command.

f:p is the scale of the figure in relation to the


Page -
font is the two-character abbreviation or full name
of one of the available fonts from the Alcuin
library.
An .LP or .PP should be specified after the last item so that the text following the
list is not also indented.
If you want to indent the label as well as the paragraph, you can use the . i n
request around the list. The following example:
.in 10
.IP figure 1 0
is the name of a cataloged figure. If
a figure has not been cataloged, you need to use
the LOCATE command.
.in 0
will produce:

7
figure is the name of a cataloged figure.
figure has not been cataloged, you need to
use the LOCATE command. I
You can specify an absolute or relative indent. To achieve the effect of a nested list,
you can use the .RS (you can think of this as either relative start or right shift) and
.RE (relative end or retreat) macros:
112 0 UNlX Text Processing 0

.IP font 10
is the two-character abbreviation or
full name of one of the available fonts
from the Alcuin library.
.RS
.IP cu
Cursive
.IF RS
Slanted
.RS
.IF LH 5 0
Left handed
.IP RH 5 0
Right handed
.RE
.IF BL
Block
.RE
The labels on the second level are aligned with the indented left margin of paragraphs
on the first level.

font is the two-character abbreviation or full name of


one of the available fonts from the Alcuin
library.
CU Cursive
RS Slanted
LH Left handed
RH Right handed
I BL Block
One thing you need to watch out for in using the .IP macro is not to include space in
the label argument. Because of the way the macro i s coded, the space may be expanded
when the finished line is adjusted. The first line will not be aligned with the rest. For
example:
.IP "font name" 10
is the two-character abbreviation or full name . - .

r
might produce the following:

font name is the two-character abbreviation or full


name of one of the available fonts from the
Alcuin library.
o The ms Macros 0 113

To avoid this problem, always use an unpaddable space (a backslash followed by a


space) to separate words in the label argument to .IP. This caution applies to many
other formatting situations as well.
Automatically numbered and alphabetized lists are not provided for in m s .
(Chapter 16 shows how to write your own macros for this.) However, by specifying the
number or letter as a label, you can make do with the .IP macro. For example:
User-oriented documentation recognizes three things:
.in +3n
.IP 1) 5n
that a new user needs
to learn the system in stages, getting a sense of the
system as a whole while becoming proficient in performing
particular tasks;
.IP 2) 5n
that there are different levels of users, and not
every user needs to learn all the capabilities
of the system in order to be productive;
.IP 3) 5n
that an experienced user must be able to rely on
the documentation for accurate and thorough reference
information.
.in -3n
This produces:

ented documentation recognizes three things:

that a new user needs to learn the system in


stages, getting a sense of the system as a
whole while becoming proficient in performing
particular tasks;

that there are different levels of users, and


not every user needs to learn all the capabil-
ities of the system in order to be productive;

that an experienced user must be able to rely on


the documentation for accurate and thorough
reference information.
The number is indented three ens and the text is indented five more ens. (Note: If you
are using nroff, you don’t need to specify units on the indents. However, if you are
using t r o f f, the default scaling for both the .IP macro and the .i n requests
shown in the previous example is ems. Remember that you can append a scaling indi-
cator to the numeric arguments of most macros and t r o f f requests.)
114 UNlX Text Processing 0

Changing Font and Point Size


When you format with n r o f f and print on a line printer, you can put emphasis on
individual words or phrases by underlining or overstriking. When you are use t r o f f
and send your output to a laser printer or typesetter, you can specify variations of type,
font, and point size based on the capabilities of the output devices.

Roman, Italic, and Bold Fonts


Most typefaces have at least three fonts available: roman, bold, and italic. Normal
body copy i s printed in the roman font. You can change temporarily to a bold or italic
font for emphasis. In Chapter 4, you learned how to specify font changes using the
.f t request and inline \ f requests. The m s package provides a set of mnemonic
macros for changing fonts:

.B bold
.I italic
.R roman

Each macro prints a single argument in a particular font. You might code a single sen-
tence as follows:
.B Alcuin
revitalizes an
.I age-old
tradition.
The printed sentence has one word in bold and one in italic.

I Alcuin revitalizes an age-old tradition.


If no argument is specified, the selected font is current until it is explicitly changed:
The a r t of
.B
calligraphy
.R
is, quite simply,
T
.l

beautiful
.R
handwriting;

The example produces:

fo- calligraphy is, quite simply, beautiful handwriting;


0 The ms Macros 0 115

You've already seen that the first argument is changed to the selected font. If you
supply a second argument, it is printed in the previous font. (You are limited to two
arguments, set off by a space; a phrase must be enclosed within quotation marks to be
taken as a single argument.) A good use for the alternate argument is to supply punc-
tuation, especially because of the restriction that you cannot begin a line with a period.
its opposite is
.B cacography .
This example produces:

I i t s opposite is cacography.

If the second argument is a word or phrase, you must supply the spacing:
The ink pen has been replaced by a
.I light " pen."
This produces:

I 1
I The ink pen has been replaced by a light pen. I
If you are using nro f f, specifying a bold font results in character overstrike; specify-
ing an italic font results in an underline for each character (not a continuous rule).
Overstriking and underlining can cause problems on some printers and terminals.
The chief advantage of these macros over the corresponding t r o f f constructs is
the ease of entry. It is easier to type:
.B calligraphy

than:
\fBcalligraphy\fP

However, you'll notice that using these macros changes the style of your input consider-
ably. As shown in the examples on the preceding pages, these macros require you to
code your input file using short lines that do not resemble the resulting filled output
text.
This style, which clearly divorces the form of the input from the form of the out-
put, is recommended by many nrof f and t rof f users. They recommend that you
use macros like these rather than inline codes, and that you begin each sentence or
clause on a new line. There are advantages in speed of editing. However, there are
others (one of the authors included) who find this style of input unreadable on the
screen, and prefer to use inline codes, and to keep the input file as readable as possible.
(There is no difference in the output file.)
116 UNIX Text Processing 0

Underlining
If you want to underline a single word, regardless of whether you are using nrof f or
t rof f, use the .UL macro:
the
.UL art
of calligraphy.

It will print a continuous rule beneath the word. You cannot specify more than a sin-
gle word with this macro.

Changing Point Size


As discussed in Chapter 4, you can change the point size and vertical spacing with the
.ps and .vs requests. However, if you do this in m s , you will find that the point
size and vertical spacing revert to 10 and 12 points, respectively, after the next para-
graph macro. This is because the paragraph macro, in addition to other tasks, resets the
point size and vertical spacing (along with various other values) to default values stored
in number registers.
The default point size and vertical spacing for a document are kept in the registers
P S and VS, respectively. If you want to change the overall point size or vertical spac-
ing, change the value in these registers. (The default values are 10 and 12, respec-
tively.) For example, to change the body type to 8 points and the spacing to 10 points,
enter the following requests at the top of your document:
.nr P S 8
.nr VS 12

At the top of a document, these settings will take effect immediately. Otherwise, you
must wait for the next paragraph macro for the new values to be recognized. If you
need both immediate and long-lasting effects, you may need a construct like:
.ps 8
.nr P S 8
.vs 12
.nr VS 12

There are also several macros for making local point size changes. The . L G macro
increases the current point size by 2 points; the .SM macro decreases the point size by
2 points. The new point size remains in effect until you change it. The .NL macro
changes the point size back to its default or normal setting. For example:

.LG
Alcuin
.NL
is a graphic arts product f o r
.SM
UNIX
.NL
systems.
0 ThemsMacros 0 117

The following line is produced:

I I
1 Alcuin is a graphic arts product for UNIX systems. I
The .LG and .S M macros simply increment or decrement the current point size
by 2 points. Because you change the point size relative to the current setting, repeating
a macro adds or subtracts 2 more points. If you are going to change the point size by
more than 2, it makes more sense to use the - p s request. The .NL macro uses the
value of the number register PS to reset the normal point size. Its default value is 10.
In the following example, the .p s request changes the point size to 12. The
.L G and .S M macros increase and decrease the point size relative to 12 points. The
.NL macro is not used until the end because it changes the point size back to 10.
.ps 12
.L G
Alcuin
- SM
is a graphic a r t s p r o d u c t for
.SM
UNIX
.L G
systems.
.NL
It produces the following line:

I Alcuin is a graphic arts product for UNIX systems. I


A change in the point size affects how much vertical space is needed for the larger or
smaller characters. Vertical spacing i s usually 2 points larger than the point size (10 on
12). Use the vertical spacing request to temporarily change the vertical spacing, if
necessary.

Displays
A document often includes material-such as tables, figures, or equations-that are not
a part of the running text, and must be kept together on the page. In ms and mm,such
document elements are referred to generically as displays.
The macros .D S , .DE, . I D , .CD, and .LD are used to handle displays in
ms. The display macros can be relied upon to provide

adequate spacing before and after the display;


horizontal positioning of the display as a left-justified, indented, or centered
block;
proper page breaks, keeping the entire display together.
118 UNlX Text Processing 0

The default action of the .DS macro is to indent the block of text without filling lines:
Some of t h e typefaces that are currently available are:
.DS
Roman
Caslon
Baskerville
Helvet i ca
-DE
This produces:

Some of t h e typefaces that are currently available are:

Roman
Caslon
Baskerville
Helvetica

You can select a different format for a display by specifying a left-justified or


centered display with one of the following arguments:

I Indented (default)
L Left-justified
C Center each line
B Block (center entire display)

The L argument can be used for formatting an address block in a letter:


.DS L
Mr. John F u s t
Vice President, Research and Development
Gutenberg Galaxy Software
Waltham, Massachusetts 02154
-DE
The display macro prevents these lines from being filled; it “protects” the carriage
returns as they were entered in the file.
A display can be centered in two ways: either each individual line in the display
is centered (c),or the entire display is centered as a block (B) based on the longest line
of the display.
The use of tabs often presents a problem outside of displays. Material that has
been entered with tabs in the input file should be formatted in no-fill mode, the default
setting of the display macros. The following table was designed using tabs to provide
the spacing.
The m s Macros 0 119

.DS L
Dates Description of Task

June 30 Submit audience analysis


July 2 Meeting t o review audience analysis
July 15 Submit detailed outline
August 1 Submit first draft
August 5 Return of first draft
August 8 Meeting t o review comments
and establish revisions
.DE
This table appears in the output just as it looks in the file. If this material had not been
processed inside a display, the columns would be improperly aligned.

Static and Floating Displays


One of the basic functions of a display is to make sure the displayed material stays
together on one page. If the display is longer than the distance to the bottom of the
page, there is a page break.
If the display is large, causing a page break can leave a large block of white space
at the bottom of the page. To avoid this problem, m s provides a set of macros for
floating displays, as well as macros for the static displays we’ve already discussed. If a
floating display doesn’t fit on the page, the formatter doesn’t force a page break.
Instead, it simply holds the displayed text in reserve while it fills up the remainder of
the page with the text following the display. It prints the display at the top of the next
page, then continues where it left off.
We have already used .DS and . D E to mark the beginning and end of a static
display. To specify a floating display, the closing mark is the same but the beginning is
marked by a different macro:

.ID Same as .DS I(indented) but floating


LD Same as .DS L (left justified) but floating
.CD Same as .DS C (center each line) but floating
.BD Same as .D S B (center display) but floating
In the following example of an input file, numbers are used instead of actual lines
of text to make the placement of the display more obvious:

2
3
4
5
.LD
Long Display
.DE
6
120 0 UNlX Text Processing 0

7
8
9
10

The following two formatted pages might be produced, assuming that there are a suffi-
cient number of lines to cause a page break:

-1- -2 -

Long Display

8
9
10

If there had been room on page 1 to fit the display, it would have been placed there, and
lines 6 and 7 would have followed the display, as they did in the input file.
If a static display had been specified in the previous example, the display would
be placed in the same position on the second page, and lines 6 and 7 would have fol-
lowed it, leaving extra space at the bottom of page 1. A floating display attempts to
make the best use of the available space on a page.
The formatter maintains a queue to hold floating displays that it has not yet out-
put. When the top of a page is encountered, the next display in the queue is output.
The queue is emptied in the order in which it was filled (first in, first out).
The macros called by the display macros to control output of a block of text are
available for other uses. They are known as “keep and release” macros. The pair
.K S / . KE keep a block together and output it on the next available page. The pair
.KF/. KE specify a floating keep; the block saved by the keep can float and lines of
text following the block may appear before it in the text.

Headings
In m s , you can have numbered and unnumbered headings. There are two heading
macros: .NH for numbered headings and .S H for unnumbered section headings.
Let’s first look at how to produce numbered headings. The syntax for the .Nfl
macro is:
.NH [level]
[heading text]
. LP
The ms Macros 0 121

(The brackets indicate optional arguments.) You can supply a numerical value indicat-
ing the level of the heading. If no value is provided for level, then a top-level heading
is assumed. The heading text begins on the line following the macro and can extend
over several lines. You have to use one of the paragraph macros, either .LP or - PP,
after the last line of the heading. For example:
.N H
Quick Tour of Alcuin
-LP
The result is a heading preceded by a first-level heading number:

r u i c k Tour of Alcuin 1
The next time you use this macro the heading number will be incremented to 2, and
after that, to 3.
You can add levels by specifying a numeric argument. A second-level heading is
indicated by 2:
.NH 2
Introduction t o Calligraphy
.LP
The first second-level heading number is printed:
1.1 Introduction t o Calligraphy

When another heading is specified at the same level, the heading number is automati-
cally incremented. If the next heading i s at the second level:
.NH 2
Digest of Alcuin Commands
.LP

m s produces:

L D D i g e s t o f Alcuin Commands

Each time you go to a new level, . 1 is appended to the number representing the exist-
ing level. That number is incremented for each call at the same level. When you back
out of a level (for instance, when you go from level 5 to 4) the counter for the level (in
this case level 5) is reset to 0.
The macro for unnumbered headings is SH:
.SH
Introduction t o Calligraphy
-LP
Unnumbered headings and numbered headings can be intermixed without affecting the
numbering scheme:
122 0 UNIX Text Processing 0

1. Quick Tour of A l c u i n

Introduction to Calligraphy

1.1 D i g e s t o f A l c u i n Commands

Headings are visible keys to your document’s structure. Their appearance can
contribute significantly to a reader recognizing that organization. If you are using
unnumbered headings, it becomes even more important to make headings stand out. A
simple thing you can do is use uppercase letters for a first-level heading.

Cover Sheet Macros -


In their original incarnation at Bell Laboratories, the m s macros were called on to for-
mat many internal AT&T documents. Accordingly, it is not surprising that there were
quite a few macros that controlled the format of specific internal document types. What
i s surprising is that these macros are still present in copies of the m s macros distributed
outside of AT&T.
You have the option of specifying that your document contains Engineer’s Notes
(.EG), an Internal Memorandum (. I M ) , a Memorandum for Record (.MR), a
Memorandum for File ( MF), a Released Paper ( RP), a Technical Reprint ( * TR), or a
letter (. LT).
Many of these formats are quite useless outside of AT&T, unless you customize
them heavily for other institutions. We prefer simply to ignore them.
In general, what these document type macros control is the appearance of the
document’s cover sheet. The content of that cover sheet is specified using the following
macros:

.T L Title
.AU Author
.AI Author’s Institution
.AB Abstract Start
.AE Abstract End

These macros are general enough that you can still use them even if you aren’t from
Bell Laboratories.
Each macro takes its data from the following line(s) rather than from an argument.
They are typically used together. For example:
- TL
UNIX Text P r o c e s s i n g
- AU
Dale D o u g h e r t y
- AU
Tim O‘Reilly
0 The ms Macros 0 123

.AI
O’Reilly & Associates, Inc.
.AB
This book provides a comprehensive introduction t o t h e major
UNIX text-processing tools. It includes a discussion of
vi, ex, nroff, and troff, as
well as many other text-processing programs.
.AE
-LP
Exactly how the output will look depends on which document types you have selected.
If you don’t specify any of the formats, you will get something like this:

I-- UNIX Text Processing

Dale Dougherty
Tim 0 ’Reilly
O’Reilly & Associates, Inc.

ABSTRACT
This book provides a comprehensive introduction to
the major UNIX text-processing tools. It includes a
discussion of v i , e x , nroff, and troff, as
well as many other text-processing programs.
You can specify as many title lines as you want following .TL. The macro will be
terminated by any of the other cover sheet macros, or by any paragraph macro. For
multiple authors, .A U and .A I can be repeated up to nine times.
The cover sheet isn’t actually printed until a reset (such as that caused by any of
the paragraph macros) is encountered, so if you want to print only a cover page, you
should conclude it with a paragraph macro even if there i s no following text.
In addition, if you use these macros without one of the overall document type
macros like .RP, the cover sheet will not be printed separately. instead, the text will
immediately follow. insert a .bp if you want a separate cover sheet.

Miscellaneous Features

Putting Information in a Box


Another way of handling special information i s to place it in a box. Individual words
can be boxed for emphasis using the .B X command:
124 0 UNlX Text Processing 0

T o m o v e t o t h e n e x t menu, press t h e
.BX RETURN
key -
This draws a box around the word RETURN.
T o m o v e t o t h e n e x t menu, press t h e
IRETURN I
key.

As you can see, it might be a good idea to reduce the point size of the boxed word.
You can enclose a block of material within a box by using the pair of macros
.B1 and .B2:
.B1
.B
.ce
Note t o Reviewers
.R
- LP
Can y o u g e t a copy o f a m a n u s c r i p t w i t h o u t a n n o t a t i o n s ?
It seems t o m e t h a t you s h o u l d be
a b l e t o mark up a page w i t h comments o r
o t h e r scribbles w h i l e i n A n n o t a t i o n M o d e and
s t i l l o b t a i n a p r i n t e d copy w i t h o u t t h e s e m a r k s .
Any i d e a s ?
- SP
.B 2
This example produces the following boxed section in t ro f f:
Note to Reviewers
Can you get a copy of a manuscript without annotations? It seems to me that you
should be able to mark up a page with comments or other scribbles while in Annota-
tion Mode and still obtain a printed copy without these marks. Any ideas?

You may want to place boxed information inside a pair of keep or display macros. This
will prevent the box macro from breaking if it crosses a page boundary. If you use
these macros with n r o f f, you must also pipe your output through the c o l postpro-
cessor as described in Chapter 4.

Footnotes
Footnotes present special problems-the main is printing the text at the bottom of the
page. The .F S macro indicates the start of the text for the footnote, and .F E indi-
cates the end of the text for the footnote. These macros surround the footnote text that
will appear at the bottom of the page. The .F S macro i s put on the line immediately
following some kind of marker, such as an asterisk, that you supply in the text and in
the footnote.
0 Thems Macros a 125

... in a n article on desktop publishing.*


.FS
* "Publish or Perish: Start-up grabs early page language
lead," Computerworld, April 21, 1986, p - 1.
-FE

All the footnotes are collected and output at the bottom of each page underneath a short
rule. The footnote text is printed in smaller type, with a slightly shorter line length then
the body text. However, you can change these if you want.
Footnotes in m s use an nrof f /trof f feature called environments (see
Chapter 14), so that parameters like line length or font that are set inside a footnote are
saved independently of the body text. So, for example, if you issued the requests:
.F S
.ft B
-11 -5n
.in +5n
Some text
-
-
-
.FE
the text within the footnote would be printed in boldface, with a 5-en indent, and the
line length would be shortened by 5 ens. The text following the footnote would be
unaffected by those formatting requests. However, the next time a footnote was called,
that special formatting would again be in effect.

I I
*"Publish or Perish: Start-up grabs early page language
lead," Computerworld, April 21, 1986, p. 1.

If a footnote is too long to fit on one page, it will be continued at the bottom of the next
page.

Two-Column Processing
One of the nice features of the m s macros is the ease with which you can create multi-
ple columns and format documents, such as newsletters or data sheets, that are best
suited to a multicolumn format.
To switch to two-column mode, simply insert the .2 C macro. To return to
single-column mode, use . 1 C . Because of the way two-column processing works in
m s , you can switch to two-column mode in the middle of a page, but switching back to
a single column forces a page break. (You'll understand the reason for this when we
return to two-column processing in Chapter 16.)
The default column width for two-column processing i s 7/15th of the line length.
It is stored in the register CW (column width). The gutter between the columns is
126 0 UNlX Text Processing 0

1/15th of the line length, and is stored in the register GW (gutter width). By changing
the values in these registers, you can change the column and gutter width.
For more than two columns, you can use the .MC macro. This macro takes two
arguments, the column width and the gutter width, and creates as many columns as will
fit in the line length. For example, if the line lengths are 7 inches, the request:
.MC 2i .3i

would create three columns 2 inches wide, with a gutter of .3 inches between the
columns.
Again, .1 C can be used to return to single-column mode. In some versions of
m s , the . R C macro can be used to break columns. If you are in the left column, fol-
lowing text will go to the top of the next column. If you are in the right column, .RC
will start a new page.

Page Headers and Footers .


When you format a page with m s , the formatter is instructed to provide several lines at
the top and the bottom of the page for a header and a footer. Beginning with the
second page, a page number appears on a single line in the header and only blank lines
are printed for the footer.
The m s package allows you to define strings that appear in the header or footer.
You can place text in three locations in the header or footer: left justified, centered, and
right justified. For example, we could place the name of the client, the title of the
document, and the date in the page header and we could place the page number in the
footer.

.ds LH GGS
.ds CH Alcuin Project P r o p o s a l
.ds RH \ * ( D Y
.ds CF P a g e %

You may notice that we use the string DY to supply today’s date in the header. In the
footer, we use a special symbol (%) to access the current page number. Here are the
resulting header and footer:

Alcuin Project P r o p o s a l

Page 2 1
April 26, 1987

Normally, you would define the header and footer strings at the start of the document,
so they would take effect throughout. However, note that there is nothing to prevent
you from changing one or more of them from page to page. (Changes to a footer string
0 Thems Macros 0 127

will take effect on the same page; changes to a header string will take effect at the top
of the next page.)

Problems on the First Page


Because m s was originally designed to work with the cover sheet macros and one of
the standard Bell document types, there are a number of problems that can occur on the
first page of a document that doesn’t use these macros.*
First, headers are not printed on the first page, nor is it apparent how to get them
printed there if you want them. The trick is to invoke the internal .NP (new page)
macro at the top of your text. This will not actually start a new page, but will execute
the various internal goings-on that normally take place at the top of a page.
Second, it is not evident how to space down from the top if you want to start your
text at some distance down the page. For example, if you want to create your own title
page, the sequence:
- s p 3i
.ce
\sl6The Invention of Movable Type\sO

will not work.


The page top macro includes an .n s request, designed to ensure that all leftover
space from the bottom of one page doesn’t carry over to the next, so that all pages start
evenly. To circumvent this on all pages after the first one, precede your spacing request
with an r s (restore spacing) request. On the first page, a .f 1 request must precede
a .rs request.

Extensions toms
In many ways, m s can be used to give you a head start on defining your own macro
package. Many of the features that are missing in m s can be supplied by user-defined
macros. Many of these features are covered in Chapters 14 through 18, where, for
example, we show macros for formatting numbered lists.

*This problem actually can occur on any page, but is most frequently encountered on the first page.
C H A P T E R

The mm Macros

A macro package provides a way of describing the format of various kinds of docu-
ments. Each document presents its own specific problems, and macros help to provide
a simple and flexible solution. The mm macro package is designed to help you format
letters, proposals, memos, technical papers, and reports. A text file that contains mm
macros can be processed by either n r o f f or t rof f, the two text formatting pro-
grams in UNIX. The output from these programs can be displayed on a terminal screen
or printed on a line printer, a laser printer, or a typesetter.
Some users of the mm macro package learn only a few macros and work produc-
tively. Others choose from a variety of macros to produce a number of different for-
mats. More advanced users modify the macro definitions and extend the capabilities of
the package by defining their own special-purpose macros.
Macros are the words that make up a format description language. Like words,
the result of a macro is often determined by context. That is, you may not always
understand your output by looking up an individual macro, just like you may not under-
stand the meaning of an entire sentence by looking up a particular word. Without exa-
mining the macro definition, you may find it hard to figure out which macro is causing
a particular result. Macros are interrelated; some macros call other macros, like a sub-
routine in a program, to perform a particular function.
After finding out what the macro package allows you to do, you will probably
decide upon a particular format that you like (or one that has evolved according to the
decisions of a group of people). To describe that format, you are likely to use only a
few of the macros, those that do the job. In everyday use, you want to minimize the
number of codes you need to format documents in a consistent manner.

Formatting a Text File


To figure out the role of a macro package such as mm,it may help to consider the dis-
tinction between formatting and format. Formatting i s an operation, a process of sup-
plying and executing instructions. You can achieve a variety of results, some pleasing,

128
ThemMacros 0 129

some not, by any combination of formatting instructions. A format is a consistent pro-


duct, achieved by a selected set of formatting instructions. A macro package makes it
possible for a format to be recreated again and again with minimal difficulty. It
encourages the user to concentrate more on the requirements of a document and less on
the operations of the text formatter.
Working with a macro package will help reduce the number of formatting instruc-
tions you need to supply. This means that a macro package will take care of many
things automatically. However, you should gradually become familiar with the opera-
tions of the n r o f f/tr o f f formatter and the additional flexibility it offers to define
new formats. If you have a basic understanding of how the formatter works, as
described in Chapter 4, you will find it easier to learn the intricacies of a macro pack-
age.

Invoking n r o f f /trof f with nun


The mm command is a shell script that invokes the n r o f f formatter and reads in the
files that contain the mm macro definitions before processing the text file(s) specified
on the command line.
$ mm option(s)filenarne(s)
If more than one file is specified on the command line, the files are concatenated before
processing. There are a variety of options for invoking preprocessors and postproces-
sors, naming an output device, and setting various number registers to alter default
values for a document. Using the m m command is the same as invoking n r o f f
explicitly with the -mm option.
Unless you specify otherwise, the mm command sets n r o f f ’ s -T option to
the terminal type set in your login environment. By default, output is sent to the termi-
nal screen. If you have problems viewing your output, or if you have a graphics termi-
nal, you may want to specify another device name using the -T option. For a list of
available devices, see Appendix B. The mm command also has a - c option, which
invokes the col filter to remove reverse linefeeds, and options to invoke t b l (-t)
and e q n (-e).
When you format a file to the screen, the output usually streams by too swiftly to
read, just as when you c a t a file to the screen. Pipe the output of the mm command
through either of the paging programs, pg or more, to view one screenful at a time.
This will give you a general indication that the formatting commands achieved the
results you had expected. To print a file formatted with mm, simply pipe the output to
the print spooler (e.g., lp) instead of to a screen paging program.
Many of the actions that a text formatter performs are dependent upon how the
document is going to be printed. If you want your document to be formatted with
t r o f f instead of n r o f f , use the mmt command (another shell script) or invoke
t r o f f directly, using the -mm option. The mmt command prepares output for laser
printers and typesetters. The formatted output should be piped directly to the print
spooler (e.g., lp) or directed to a file and printed separately. You will probably need
to check at your site for the proper invocation of mmt if your site supports more than
one type of laser printer or typesetter.
130 0 UNlX Text Processing 0

If you are using o t r o f f , be sure you don’t let t r o f f send the output to your
terminal because, in all probability, it will cause your terminal to hang, or at least to
scream and holler.
In this chapter, we will generally show the results of the mm command, rather
than mmt-that is, we’ll be showing n r o f f rather than t r o f f . Where the subject
under discussion is better demonstrated by t r o f f , we will show t r o f f output
instead. We assume that by now, you will be able to tell which of the programs has
been used, without our mentioning the actual commands.

Problems in Getting Formatted Output


When you format an mm-coded document, you may only get a portion of your format-
ted document. Or you may get none of it. Usually, this is because the formatter has
had a problem executing the codes as they were entered in the input file. Most of the
time it is caused by omitting one of the macros that must be used in pairs.
When formatting stops like this, one or more error messages might appear on your
screen, helping you to diagnose the problems. These messages refer to the line numbers
in the input file where the problems appear to be, and try to tell you what is missing:

ERROR:(filename) line number


Error message

Sometimes, you won’t get error messages, but your output will break midway. Gen-
erally, you have to go in the file at the point where it broke, or before that point, and
examine the macros or a sequence of macros. You can also run a program on the input
file to examine the code you have entered. This program, available at most sites, is
called checkmm.

Default Formatting
In Chapter 4, we looked at a sample letter formatted by n r o f f . It might be interest-
ing, before putting any macros in the file, to see what happens if we format l e t t e r
as it is, this time using the mm command to read in the mm macro package.
Refer to Figure 6-1 and note that

a page number appears in a header at the top of the page;


the address block still forms two long lines;
lines of input text have been filled, forming block paragraphs;
the right margin is ragged, not justified as with n r o f f ;
the text is not hyphenated
. space has been allocated for a page with top, bottom, left, and right margins.
0 ThemmMacros 0 131

- 1 -

April 1 , 1987

Mr. John Fust Vice President, Research and


Development Gutenberg Galaxy Software Waltham,
Massachusetts 02159

Dear Mr. Fust:

In our conversation last Thursday, we discussed a


documentation project that would produce a user's
manual on the Alcuin product. Yesterday, I
received the product demo and other materials that
you sent me.

Going through a demo session gave me a much better


understanding of the product. I confess to being
amazed by Alcuin. Some people around here,
looking over my shoulder, were also astounded by
the illustrated manuscript I produced with Alcuin.
One person, a student of calligraphy, was really
impressed.

In the next couple of days, I'll be putting


together a written plan that presents different
strategies for documenting the Alcuin product.
After I submit this plan, and you have had time to
review it, let's arrange a meeting at your company
to discuss these strategies.

Thanks again for giving us the opportunity to bid


on this documentation project. I hope we can
decide upon a strategy and get started as soon as
possible in order to have the manual ready in time
for the first customer shipment. I look forward to
meeting with you towards the end of next week.

Sincerely,

Fred Caslon

Fig. 6-1. A Raw mm-formatted File


132 UNlX Text Processing 0

Page Layout
When you format a page with mm, the formatter is instructed to provide several lines at
the top and the bottom of the page for a header and a footer. By default, a page number
appears on a single line in the header and only blank lines are printed for the footer.
There are basically two different ways to change the default header and footer.
The first way is to specify a command-line parameter with the mm or m m t commands
to set the number register N. This allows you to affect how pages are numbered and
where the page number appears. The second way is to specify in the input file a macro
that places text in the header or footer. Let’s look at both of these techniques.

Setting Page Numbering Style


When you format a document, pages are numbered in sequence up to the end of the
document. This page number is usually printed in the header, set off by dashes.
-1-

Another style of page numbering, used in documents such as technical manuals,


numbers pages specific to a section. The first page of the second section would be
printed as:
2-1

The other type of change affects whether or not the page number is printed in the
header at the top of the first page.
The number register N controls these actions. This register has a default setting
of 0 and can take values from 0 through 5. Table 6-1 shows the effect of these values.

TABLE 6-1. Page Number Styles, Register N

Value Action
0 The page number prints in the header on all pages.
This is the default page numbering style.
1 On page 1 , the page number is printed in place of
the footer.
2 On page 1 , the page number in not printed.
3 All pages are numbered by section, and the page
number appears in the footer. This setting affects
the defaults of several section-related registers and
macros. It causes a page break for a top-level head-
ing (E j=l), and invokes both the .F D and .RP
macros to reset footnote and reference numbering.
0 ThemmMacros 0 133

TABLE 6-1. -(Cont'd)

Value Action
4 The default header containing the page number is
suppressed, but it has no effect on a header supplied
by a page header macro.
5 All pages are numbered by section, and the page
number appears in the footer. In addition, labeled
displays (.FC, .TB, .EX, and .EC) are also
numbered by section.

The register N can be set from the command line using the -r option. If we set
it to 2, no page number will appear at the top of page 1 when we print the sample letter:
$ ram -rN2 letter I lp

Header and Footer Macros


The mm package has a pair of macros for defining what should appear in a page header
(. PH) and a page footer ( -PF). There is also a set of related macros for specifying
page headers and footers for odd-numbered pages (.OH and .OF) or for even-
numbered pages (. EH and - EF). All of these macros have the same form, allowing
you to place text in three places in the header or footer: left justified, centered, and right
justified. This is specified as a single argument in double quotation marks, consisting of
three parts delimited by single quotation marks.
' left' center' right'
For e x a m p l e , w e c o u l d p l a c e t h e name o f a c l i e n t , t h e
t i t l e o f t h e document, and t h e date i n t h e p a g e h e a d e r ,
a n d w e c o u l d p l a c e t h e p a g e number i n t h e f o o t e r :
.P H"'GGS' Alcuin Project Proposal' \* (DT' "
.PF "''Page % '"'
You may notice that we use the string DT to supply today's date in the header. The
following header appears at the top of the page.

GGS Alcuin Project Proposal I


April 26, 1987

In the footer, we use a special symbol (%)to access the current page number. Only text
to be centered was specified; however, the four delimiters were still required to place
the text correctly. This footer appears at the bottom of the page:
134 0 UNlX Text Processing 0

Page 2

The header and footer macros override the default header and footer.
1
Setting Other Page Control Registers
The mm package uses number registers to supply the values that control line length,
page offset, point size, and page length, as shown in Table 6-2.

TABLE 6-2. Number Registers

Register Contains t r o f f Default n r o f f Default


0 Page offset (left margin) -75 Si
N Page numbering style 0 0
P Page length 66v 66 lines
S Point size ( t r o f f only) 10 NA
W Line length or width 6i 60

These registers must be defined before the mm macro package is read by nrof f
or t r o f f . Thus, they can be set from the command line using the -r option, as we
showed when we gave a new value for register N. Values of registers 0 and W for
n r o f f must be given in character positions (depending on the character size of the
output device for nrof f , S i might translate as either 5 or 6 character positions), but
t r o f f can accept any of the units descibed in Chapter 4. For example:
$ mm -rN2 -rW65 -r10 file
but:
$ rnmt -rN2 -rW6.5i -rOli file
Or the page control registers can be set at the top of your file, using the .s o request to
read in the m m macro package, as follows:
.nr N 2
.nr W 65
.nr 0 10
.so /usr/lib/tmac/tmac.m

If you do it this way, you cannot use the mm command. Use n r o f f or t r o f f


without the -mm option. Specifying -mm would cause the mm macro package to be
read twice; mrn would trap that error and bail out.
ThemMacros 0 135

Paragraphs
The .P macro marks the beginning of a paragraph.
.P
In our conversation last Thursday, we discussed a
This macro produces a left-justified, block paragraph. A blank line in the input file also
results in a left-justified, block paragraph, as you saw when we formatted an uncoded
file.
However, the paragraph macro controls a number of actions in the formatter,
many of which can be changed by overriding the default values of several number regis-
ters. The .P macro takes a numeric argument that overrides the default paragraph
type, which is a block paragraph. Specifying 1 results in an indented paragraph:
.P 1
Going through a demo session gave me a much better
The first three paragraphs formatted for the screen follow:

In our conversation last Thursday, we discussed a


documentation project that would produce a user's manual
on the Alcuin product. Yesterday, I received the product
demo and other materials that you sent me.

Going through a demo session gave me a much better


understanding of the product. I confess to being amazed
by Alcuin. Some people around here, looking over my
shoulder, were also astounded by the illustrated
manuscript I produced with Alcuin. One person, a student
of calligraphy, was really impressed.

In the next couple of days, I'll be putting together a


written plan that presents different strategies for
documenting the Alcuin product. After I submit this plan,
and you have had time to review it, let's arrange a
meeting at your company to discuss these strategies.

The first line of the second paragraph is indented five spaces. (In t r o f f the default
indent is three ens.) Notice that the paragraph type specification changes only the
second paragraph. The third paragraph, which is preceded in the input file by .P
without an argument, i s a block paragraph.
If you want to create a document in which all the paragraphs are indented, you
can change the number register that specifies the default paragraph type. The value of
P t is 0 by default, producing block paragraphs. For indented paragraphs, set the value
of P t to 1. Now the .P macro will produce indented paragraphs.
.nr Pt 1
If you want to obtain a block paragraph after you have changed the default type,
specify an argument of 0:
136 0 UNlX Text Processing 0

.P 0

When you specify a type argument, it overrides whatever paragraph type is in effect.
There is a third paragraph type that produces an indented paragraph with some
exceptions. If Pt is set to 2, paragraphs are indented except those following section
headings, lists, and displays. It is the paragraph type used in this book.
The following list summarizes the three default paragraph types:

0 Block
1 Indented
2 Indented with exceptions

Vertical Spacing
The paragraph macro also controls the spacing between paragraphs. The amount of
space is specified in the number register P s. This amount differs between nrof f
and troff.
With nrof f, the .P macro has the same effect as a blank line, producing a full
space between paragraphs. However, with t r o f f , the .P macro outputs a blank
space that i s equal to one-half of the current vertical spacing setting. Basically, this
means that a blank line will cause one full space to be output, and the .P macro will
output half that space.
The P macro invokes the .SP macro for vertical spacing. This macro take a
numeric argument requesting that many lines of space.
Sincerely,
.SP 3
F r e d Caslon

Three lines of space will be provided between the salutation and the signature lines.
You do not achieve the same effect if you enter - SP macros on three consecu-
tive lines. The vertical space does not accumulate and one line of space is output, not
three.
Two or more consecutive .SP macros with numeric arguments results in the
spacing specified by the greatest argument. The other arguments are ignored.
.SP 5
.SP
.SP 2
In this example, five lines are output, not eight.
Because the .P macro calls the .SP macro, it means that two or more consecu-
tive paragraph macros will have the same effect as one.
0 ThemmMacros 0 137

The .SP Macro versus the .sp Request


There are several differences between the .SP
macro and the .s p request. A series
of .s p requests does cause vertical spacing to accumulate. The following three
requests produce eight blank lines:
.sp 5
-SP
.sp 2

The argument specified with the .SP macro cannot be scaled nor can it be a
negative number. The .SP macro automatically works in the scale (v) of the current
vertical spacing. However, both .SP and .s p accept fractions, so that each of the
following codes has the same result:
.sp . 3 v .SP . 3 .sp . 3

Justification .
A document formatted by n r o f f with mm produces, by default, unjustified text (an
uneven or ragged-right margin). When formatted by t r o f f , the same document is
automatically justified (the right margin is even).
If you are using both n r o f f and t r o f f , it is probably a good idea to expli-
citIy set justification on or off rather than depend upon the default chosen by the for-
matter. Use the .S A macro (set adjustment) to set document-wide justification. An
argument of 0 specifies no justification; 1 specifies justification.
If you insert this macro at the top of your file:
.SA 1

both n r o f f and t r o f f will produce right-justified paragraphs like the following:

In our conversation last Thursday, w e discussed


a documentation project that would produce a user's
manual on t h e Alcuin product. Yesterday, I received t h e
product demo and other materials that you sent me.

Word Hyphenation
One way to achieve better line breaks and more evenly filled lines is to instruct the for-
matter to perform word hyphenation.
Hyphenation is turned off in the m m macro package. This means that the for-
matter does not try to hyphenate words to make them fit on a line unless you request it
by setting the number register H y to 1. If you want the formatter to automatically
hyphenate words, insert the following line at the top of your file:
138 UNlX Text Processing 0

.nr Hy 1
Most of the time, the formatter breaks up a word correctly when hyphenating. Some-
times, however, it does not and you have to explicitly tell the formatter either how to
split a word (using the .h y request) or not to hyphenate at all (using the .nh
request).

Displays
When we format a text file, the line breaks caused by carriage returns are ignored by
n r o f f / t r o f f . How text is entered on lines in the input file does not affect how
lines are formed in the output. It doesn’t really matter whether information is typed on
three lines or four; it appears the same after formatting.
You probably noticed that the name and address at the beginning of our sample
file did not come out in block form. The four lines of input ran together and produced
two filled lines of output:
Mr. John Fust Vice President, Research and Development
Gutenberg Galaxy Software Waltham, Massachusetts 02159

The formatter, instead of paying attention to carriage returns, acts on specific macros or
requests that cause a break, such as .P, .SP, or a blank line. The formatter request
b r is probably the simplest way to break a line:
Mr. John Fust
.br
Vice President, Research and Development
The .b r request is most appropriate when you are forcing a break of a single line.
For larger blocks of text, the mrn macro package provides a pair of macros for indicat-
ing that a block of text should be output just as it was entered in the input file. The
.DS (display start) macro is placed at the start of the text, and the . D E (display end)
macro is placed at the end:
.DS
Mr. John Fust
Vice President, Research and Development
Gutenberg Galaxy Software
Waltham, Massachusetts 02159
-DE
The formatter does not fill these lines, so the address block is output on four lines, just
as it was typed. In addition, the . D E macro provides a line of space following the
display.

Our Coding Efforts, So Far


W e have pretty much exhausted what we can do using the sample letter. Before going
on to larger documents, you may want to compare the coded file in Figure 6-2 with the
n r o f f-formatted output in Figure 6-3. Look them over and make sure you understand
what the different macros are accomplishing.
0 ThemmMacros 0 139

.nr Pt 1
.SA 1
April 1, 1987
.SP 2
.DS
Mr. John Fust
Vice President, Research and Development
Gutenberg Galaxy Software
Waltham, Massachusetts 02159
.DE
Dear Mr. Fust:
.P
In our conversation last Thursday, we discussed a
documentation project that would produce a user's manual
on the Alcuin product. Yesterday, I received the product
demo and other materials that you sent me.
.P
Going through a demo session gave me a much better
understanding of the product. I confess to being amazed
by Alcuin. Some people around here, looking over my
shoulder, were also astounded by the illustrated
manuscript I produced with Alcuin. One person, a student
of calligraphy, was really impressed.
.P
In the next couple of days, I'll be putting together a
written plan that presents different strategies f o r
documenting the Alcuin product. After I submit this plan,
and you have had time to review it, let's arrange a
meeting at your company to discuss these strategies.
.P
Thanks again f o r giving us the opportunity to bid on this
documentation project. I hope we can decide upon a
strategy and get started as soon as possible in order to
have the manual ready in time for the first customer
shipment. I look forward to meeting with you towards the
end of next week.
.SP
Sincerely,
.SP 2
Fred Caslon

Fig. 6-2. Coded File


140 UNlX Text Processing 0

- 1 -

April 1, 1987

Mr. John Fust


Vice President, Research and Development
Gutenberg Galaxy Software
Waltham, Massachusetts 02159

Dear Mr. Fust:

In our conversation last Thursday, we


discussed a documentation project that would
produce a user's manual on the Alcuin product.
Yesterday, I received the product demo and other
materials that you sent me.

Going through a demo session gave me a much


better understanding of the product. I confess to
being amazed by Alcuin. Some people around here,
looking over my shoulder, were also astounded by
the illustrated manuscript I produced with Alcuin.
One person, a student of calligraphy, was really
impressed.

In the next couple of days, I'll be putting


together a written plan that presents different
strategies for documenting the Alcuin product.
After I submit this plan, and you have had time to
review it, let's arrange a meeting at your company
to discuss these strategies.

Thanks again for giving us the opportunity to


bid on this documentation project. I hope we can
decide upon a strategy and get started as soon as
possible in order to have the manual ready in time
for the first customer shipment. I look forward to
meeting with you towards the end of next week.

Sincerely,

Fred Caslon

Fig. 6-3. Formatted Output


0 ThemmMacros 0 141

We have worked through some of the problems presented by a very simple one-
page letter. A s we move on, we will be describing specialized macros that address the
problems of multiple page documents, such as proposals and reports. In many ways,
the macros for more complex documents are the feature performers in a macro package,
the ones that really convince you that a markup language is worth learning.

Changing Font and Point Size

When you format with n r o f f and print on a line printer, you can put emphasis on
individual words or phrases by underlining or overstriking. When you are using
t r o f f and send your output to a laser printer or typesetter, you can specify variations
of type, font, and point size based on the capabilities of the output device.

Roman, Italic, and Bold Fonts


Most typefaces have at least three fonts available: roman, bold, and italic. Normal
body copy is printed in the roman font. You can change temporarily to a bold or italic
font for emphasis. In Chapter 4, you learned how to specify font changes using the
.f t request and inline \f requests. The mm package provides a set of mnemonic
macros for changing fonts:

.B Bold
.I Italic
.R Roman

Each macro prints a single argument in a particular font. You might code a single sen-
tence as follows:
.B Alcuin
revitalizes an
.I age-old
tradition.

The printed sentence has a word in bold and one in italic. (In nroff, bold space is
simulated by overstriking, and italics by underlining.)

AIcuin revitalizes an age-old tradition. 1


142 UNlX Text Processing 0

If no argument i s specified, the selected font is current until it is explicitly changed:


The art of
.E
calligraphy
.R
is, quite simply,

beaut i fu 1
.R
handwriting;

The previous example produces:

=of calligraphy is, quite simply, beautiful handwriting;

You've already seen that the first argument is changed to the selected font. If you
supply a second argument, it is printed in the previous font. Each macro takes up to six
arguments for alternating font changes. (An argument is set off by a space; a phrase
must be enclosed within quotation marks to be taken as a single argument.) A good use
for the alternate argument is to supply punctuation, especially because of the restriction
that you cannot begin an input line with a period.
its opposite is
.B cacography .
This example produces:

I its opposite is cacography. 1


If you specify alternate arguments consisting of words or phrases, you must supply the
spacing:
The ink pen has been replaced by a
.I light " pen."

This produces:

=pepen has been replaced by a light pen. 1


Here's an example using all six arguments:
Alcuin uses three input devices, a
.E "light pen" ", a " "mouse" ", and a " "graphics tablet."

This produces:

Alcuin uses three input devices, a light pen, a mouse, and a graphics tablet.
0 ThemmMacros 0 143

There are additional macros for selecting other main and alternate fonts. These macros
also take up to six arguments, displayed in alternate fonts:

.BR Alternate bold and roman


.I B Alternate italic and bold
.R I Alternate roman and italic
.BI Alternate bold and italic
- IR Alternate italic and roman
.RB Alternate roman and bold

If you are using n r o f f , specifying a bold font results in character overstrike; specify-
ing an italic font results in an underline for each character (not a continuous rule).
Overstriking and underlining can cause problems on some printers and terminals.

Changing Point Size


When formatting with t ro f f , you can request a larger or smaller point size for the
type. A change in the point size affects how much vertical space is needed for the
larger or smaller characters. Normal body copy is set in 10-point type with the vertical
spacing 2 points larger.
You learned about the .p s (point size) and .v s (vertical spacing) requests in
Chapter 4. These will work in mm; however, mm also has a single macro for changing
both the point size and vertical space:

.S [point size] [vertical spacing]


The values for point size and vertical spacing can be set in relation to the current set-
ting: + increments and - decrements the current value. For example, you could
specify relative point size changes:
.s + 2 +2

or absolute ones:
.s 12 1 4

By default, if you don’t specify vertical spacing, a relation of 2 points greater than the
point size will be maintained. A null value (””) does not change the current setting.
The new point size and vertical spacing remain in effect until you change them.
Simply entering the .S macro without arguments restores the previous settings:
.s
The mm package keeps track of the default, previous, and current values, making it
easy to switch between different settings using one of these three arguments:

D Default
P Previous
C Current
144 0 UNIX Text Processing 0

To restore the default values, enter:


.s D
The point size returns to 10 points and the vertical spacing is automatically reset to 12
points. To increase the vertical space to 16 points while keeping the point size the
same, enter:
.S C 16

In the following example for a letterhead, the company name is specified in 18-point
type and a tag line in 12-point type; then the default settings are restored:
.S 18
Caslon Inc.
.s 12
Communicating Expertise
.S D

The result is:

1I
r

Caslon Inc.
Communicating Expertise
You can also change the font along with the point size, using the 1 macro described
previously. Following is the tag line in 12-point italic.

=Communicating Expertise
A special-purpose macro in mm reduces by 1 point the point size of a specified string.
The .SM macro can be followed by one, two, or three strings. Only one argument is
reduced; which one depends upon how many arguments are given. If you specify one
or two arguments, the first argument will be reduced by 1 point:
using
.SM UNIX ,
you will find

The second argument is concatenated to the first argument, so that the comma immedi-
ately follows the word UNIX:

I- using UNIX, you will find I


If you specify three arguments:
.SM [ UNIX 1
The second argument is reduced by one point, but the first and third arguments are
printed in the current point size, and all three are concatenated:
0 ThemmMacros 0 145

I
More about Displays
Broadly speaking, a display is any kind of information in the body of a document that
cannot be set as a normal paragraph. Displays can be figures, quotations, examples,
tables, lists, equations, or diagrams.
The display macros position the display on the page. Inside the display, you
might use other macros or preprocessors such as t b l or eqn. You might simply
have a block of text that deserves special treatment.
The display macros can be relied upon to provide

adequate spacing before and after the display;


horizontal positioning of the display as a left justified, indented, or centered
block;
proper page breaks by keeping the entire display together.

The default action of the DS macro is to left justify the text block in no-fill mode. It
provides no indentation from the current margins.
You can specify a different format for a display by specifying up to three argu-
ments with the .DS macro. The syntax is:

.DS ybrmat] lfill mode] [right indent]


The format argument allows you to specify an indented or centered display. The argu-
ment can be set by a numeric value or a letter corresponding to the following options:

O L N o indent (default)
1 1 Indented
2 c Center each line
3 CB Center entire display

For consistency, the indent of displays is initially set to be the same as indented para-
graphs (five spaces in n r o f f and three ens in t r o f f ) , although these values are
maintained independently in two different number registers, P i and Si. (To change
the defaults, simply use the .nr request to put the desired value in the appropriate
register.)
A display can be centered in two ways: either each individual line in the display
is centered (C) or the entire display is centered as a block based on the longest line of
the display (CB).
For instance, the preceding list was formatted using t b l , but its placement was
controlled by the display macro.
146 UNlX Text Processing 0

.DS CB
.TS
table specifications
- TE
.DE
Thefill mode argument is represented by either a number or a letter

0 N No-fill mode (default)


1 F Fill mode

The right indent argument is a numeric value that is subtracted from the right
margin. In nrof f, this value is automatically scaled in ens. In t r o f f, you can
specify a scaled number; otherwise, the default is ems.
The use of fill mode, along with other indented display options, can provide a
paragraph indented on both sides. This is often used in reports and proposals that quote
at length from another source. For example:
.P
I was particularly interested in the following comment
found in the product specification:
.DS I F 5
Users first need a brief introduction to what the product
does. Sometimes this is more for the benefit of people
who haven't yet bought the product, and
are just looking at the manual.
However, it also serves to put the rest of
the manual, and the product itself, in the proper context.
-DE
The result of formatting is:

I was particular y interested in the following comment


found in the the product specification:

I Users first need a brief introduction to


what the product does. Sometimes this is
more for the benefit of people who haven't
yet bought the product, and are just looking
at the manual. However, it also serves to
put the rest of the manual, and the product
itself, in the proper context.

The use of tabs often presents a problem outside of displays. Material that has
been entered with tabs in the input file should be formatted in no-fill mode, the default
setting of the display macros. The following table was designed using tabs to provide
the spacing:
0 ThemmMacros 0 147

.DF I
Dates Description of Task

June 30 Submit audience analysis


July 2 Meeting t o review audience analysis
July 15 Submit detailed outline
August 1 Submit first draft
August 5 Return of first draft
August 8 Meeting to review comments
.DE

This table appears in the output just as it looks in the file. If this material had not been
processed inside a display in no-fill mode, the columns would be improperly aligned.

Static and Floating Displays


There are two types of displays, static andfloating. The difference between them has to
do with what happens when a display cannot fit in its entirety on the current page.
Both the static and the floating display output the block at the top of the next page if it
doesn't fit on the current page; however, only the floating display allows text that fol-
lows the display to be used to fill up the preceding page. A static display maintains the
order in which a display was placed in the input file.
We have already used .D S and . D E to mark the beginning and end of a static
display. To specify a floating display, the closing mark is the same, but the beginning
is marked by the .DF macro. The options are the same as for the .D S macro.
In the following example of an input file, numbers are used instead of actual lines
of text:
1
L

3
4
5
.DF
Long Displuy
.DE
6
7
8
9
10

The following two formatted pages might be produced, assuming that there are a suffi-
cient number of lines in the display to cause a page break:
148 0 UNlX Text Processing 0

-1- -2-

Long Display

8
9
10

If there had been room on page 1 to fit the display, it would have been placed there, and
lines 6 and 7 would have followed the display, as they did in the input file.
If a static display had been specified, the display would be placed in the same
position on page 2, and lines 6 and 7 would have to follow it, leaving extra space at the
bottom of page 1 . A floating display attempts to make the best use of the available
space on a page.
The formatter maintains a queue to hold floating displays that it has not yet out-
put. When the top of a page is encountered, the next display in the queue is output.
The queue is emptied in the order in which it was filled, (first in, first out). Two
number registers, D e and D f , allow you to control when displays are removed from
the queue and placed in position.
At the end of a section, as indicated by the section macros - H and .HU (which
we will see shortly), or at the end of the input file, any floating displays that remain in
the queue will be placed in the document.

Display Labels
You can provide a title or caption for tables, equations, exhibits, and figures. In addi-
tion, the display can be labeled and numbered in sequence, as well as printed in a table
of contents at the end of the file. The following group of macros are available:

.EC Equation
.EX Exhibit
.FG Figure

All of these macros work the same way and are usually specified within a pair of
.O S / . DE macros, so that the title and the display appear on the same page. Each
macro can be followed by a title. If the title contains spaces, it should be enclosed
within quotation marks. The title of a table usually appears at the top of a table, so it
must be specified before the .TS macro that signals to t b l the presence of a table
(see Chapter 8).
0 ThemmMacros 149

.TB "List of Required Resources"


- TS
The label is centered:
I I
ITable 1. List of Required Resources I
If the title exceeds the line length, then it will be broken onto several lines. Addi-
tional lines are indented and begin at the first character of the title.

Table 1. List of Required Resources


Provided by Gutenberg Galaxy
I Software
The label for equations, exhibits, and figures usually follows the display. The fol-
lowing:
.FG "Drawing with a Light Pen"
produces a centered line:

I 1
I Figure 1. Drawing with a Light Pen I
The default format of the label can be changed slightly by setting the number
register O f to 1. This replaces the period with a dash.

I 1
I Figure 1 - Drawing with a Light Pen I
Second and third arguments, specified with the label macros, can be used to
modify or override the default numbering of displays. Basically, the second argument
is a literal and the third argument a numeric value that specifies what the literal means.
If the third argument is

0 then the second argument will be treated as a prefix;


1 then the second argument will be treated as a suffix;
2 then the second argument replaces the normal table number.

Thus, a pair of related tables could be specified as l a and l b using the following labels:
.TB "Estimated Hours: June, July, and August" a 1
.TB "Estimated Hours: September and November," lb 2

(These labels show two different uses of the third argument. Usually, you would con-
sistently use one technique or the other for a given set of tables.)
For tbl, the delimiters for tables are - TS/.TE. For e q n , the delimiters for
equations are .EQ/.EN. For p i c , the delimiters for pictures or diagrams are
.PS/. PE. These pairs of delimiters indicate a block to be processed by a specific
150 0 UNlX Text Processing 0

preprocessor. You will find the information about each of the preprocessors in Chapters
8 through 10. A s mentioned, the preprocessor creates the display, the display macros
position it, and the label macros add titles and a number.
Although it may seem a minor point, each of these steps is independent, and
because they are not fully integrated, there is some overlap.
The label macros, being independent of the preprocessors, do not make sure that a
display exists or check whether a table has been created with tbl. You can create a
two-column table using tabs or create a figure using character symbols and still give it a
label. Or you can create a table heading as the first line of your table and let t b l pro-
cess it (tbl won’t provide a number and the table won’t be collected for the table of
contents).
In t bl, you can specify a centered table and not use the .DS/.DE macros.
But, as a consequence, nrof f / trof f won’t make a very good attempt at keeping the
table together on one page, and you may have to manually break the page. It is recom-
mended that you use the display macros throughout a document, regardless of whether
you can get the same effect another way, because if nothing else you will achieve con-
sistency.

Forcing a Page Break


Occasionally, you may want to force a page break, whether to ensure that a block of
related material is kept together or to allow several pages for material that will be
manually pasted in, such as a figure. The .SK (skip) macro forces a page break. The
text following this macro is output at the top of the next page. If supplied with an argu-
ment greater than 0, it causes that number of pages to be skipped before resuming the
output of text. The “blank” pages are printed, and they have the normal header and
footer.
On the next page, you will find a sample page from an
Alcuin manuscript printed with a 16-color plotter.
.SK 1

Formatting Lists
The mm macro package provides a variety of different formats for presenting a list of
items. You can select from four standard list types:

bulleted
dashed
numbered
alphabetized
0 ThemmMacros 0 151

In addition, you have the flexibility to create lists with nonstandard marks or text labels.
The list macros can also be used to produce paragraphs with a hanging indent.
Each list item consists of a special mark, letter, number, or label in a left-hand
column with a paragraph of text indented in a right-hand column.

Structuring a List
The list macros help to simplify what could be a much larger and tedious formatting
task. Here’s the coding for the bulleted list just shown:
.BL
*LI
bulleted

dashed
.LI
numbered
.LI
alphabetized
.LE
The structure of text in the input file has three parts: a list-initialization macro (. BL),
an item-mark macro (. LI), and a list-end macro ( - LE).
First, you initialize the list, specifying the particular macro for the type of list that
you want. For instance, BL initializes a bulleted list.
You can specify arguments with the list-initialization macro that change the
indentation of the text and turn off the automatic spacing between items in the list. We
will examine these arguments when we look at the list-initialization macros in more
detail later.
Next, you specify each of the items in the list. The item-mark macro, .LI, is
placed before each item. You can enter one or more lines of text following the macro.
.B L
.LI
Item 1
.LI
Item 2
.LI
Item 3

When the list is formatted, the .LI macro provides a line of space before each item.
(This line can be omitted through an argument to the list-initialization macro if you
want to produce a more compact list. We’ll be talking more about this in a moment.)
The .LI macro can also be used to override or prefix the current mark. If a
mark is supplied as the only argument, it replaces the current mark. For example:
152 0 UNlX Text Processing 0

.LI 0
Item 4

If a mark is supplied as the first argument, followed by a second argument of 1 , then


the specified mark is prefixed to the current mark. The following:
.LI - 1
Item 5

would produce:

r* Item 5 1
A text label can also be supplied in place of the mark, but it presents some addi-
tional problems for the proper alignment of the list. We will look at text labels for
variable-item lists.
The .LI macro does not automatically provide spacing after each list item. An
argument of 1 can be specified if a line of space is desired.
The end of the list is marked by the list-end macro .LE. It restores page format-
ting settings that were in effect prior to the invocation of the last list-initialization
macro. The . L E macro does not output any space following the list unless you
specify an argument of 1. (Don’t specify this argument when the list is immediately
followed by a macro that outputs space, such as the paragraph macro.)
Be sure you are familiar with the basic structure of a list. A common problem is
not closing the list with .LE. Most of the time, this error causes the formatter to quit
at this point in the file. A less serious, but nonetheless frequent, oversight is omitting
the first .LI between the list-initialization macro and the first item in the list. The list
is output but the first item will be askew.
Here is a sample list:
- BL
.LI
Item 1

Item 2
.LI
Item 3
.LI 0
Item 4
.LI - 1
Item 5
.LE
0 ThemmMacros 0 153

The t r o f f output produced by the sample list is:

Item 1
Item 2

Item 3
o Item 4
-* Item 5

Complete list structures can be nested within other lists up to six levels. Different
types of lists can be nested, making it possible to produce indented outline structures.
3ut, like nested if-then structures in a program, make sure you know which level you
are at and remember to close each list.
For instance, we could nest the bulleted list inside a numbered list. The list-
initialization macro .AL generates alphabetized and numbered lists.

.A L
.LI
Don't worry, we'll get t o the list-initialization macro .AL.
You can specify five different variations of
alphabetic and numbered lists.
.BL
.LI
Item 1
.LI
Item 2
.
LI
Item 3
.LE
.
LI
We'll also look at variable-item lists.
.LE

This input produces the following formatted list from t rof f :


154 0 UNlX Text Processing 0

Item 1
Item 2
Item 3

re’ll also look at variLJle-item lists.

You may already realize the ease with which you can make changes to a list. The
items in a list can be easily put in a new order. New items can be added to a numbered
list without readjusting the numbering scheme. A bulleted list can be changed to an
alphabetized list by simply changing the list-initialization macro. And you normally
don’t have to be concerned with a variety of specific formatting requests, such as setting
indentation levels or specifying spacing between items.
On the other hand, because the structure of the list is not as easy to recognize in
the input file as it is in the formatted output, you may find it difficult to interpret com-
plicated lists, in particular ones that have been nested to several levels. The code-
checking program, checkmm, can help; in addition, you may want to format and print
repeatedly to examine and correct problems with lists.

Marked Lists
Long a standby of technical documents, a marked list clearly organizes a group of
related items and sets them apart for easy reading. A list of items marked by a bullet
( 0 ) is perhaps the most common type of list. Another type of marked list uses a dash
(-). A third type of list allows the user to specify a mark, such as a square ( ). The
list-initialization macros for these lists are:

.BL [text indent] [ 11


.DL [text indent] [ 11
.M L [mark] [text indent] [ 11

With the .BL macro, the text is indented the same amount as the first line of an
indented paragraph. A single space is maintained between the bullet and the text. The
bullet is right justified, causing an indent of several spaces from the left margin.
As you can see from this n r o f f-formatted output, the bullet is simulated in
n r o f f by a + overstriking an 0 :
0 ThemmMacros 0 155

cb Alcuin/UNIX interface definition

I cb Programmer's documentation for Alcuin I


If you specify a text indent, the first character of the text will start at that position. The
position of the bullet is relative to the text, always one space to its left.
If the last argument is 1, the blank line of space separating items is omitted. If
you want to specify only this argument, you must specify either a value or a null value
("">for a text indent.
.BL '"' 1
It produces a much more compact list:

I I
cb G G S Technical Memo 3 2 0 0
6 G G S Product Marketing Spec
cb Alcuin/UNIX interface definition
cb Programmer's documentation for Alcuin

Because the bullets produced by n r o f f are not always appropriate due to the
overstriking, a dashed list provides a suitable alternative. With the ,DL macro, the
dash is placed in the same position as a bullet in a bulleted list. A single space is main-
tained between the dash and the text, which, like the text with a bulleted list, is indented

r
by the amount specified in the number register for indented paragraphs (Pi).
The n r o f f formatter supplies a dash that is a single hyphen, and t r o f f sup-
plies an em dash. Because the em dash is longer, and the dash is right justified, the
alignment with the left margin is noticeably different. It appears left justified in
t r o f f ; in n r o f f,the dash appears indented several spaces because it is smaller.

The third chapter on the principles of computerized


font design should cover the following topics:

- Building a Font Dictionary

- Loading a Font

- Scaling a Font

You can specify a text indent and a second argument of 1 to inhibit spacing between
items.
156 0 UNlX Text Processing 0

With the .ML macro, you have to supply the mark for the list. Some possible
candidates are the square (enter \ (sq to get o),the square root (enter \ (sr to get
d), which resembles a check mark, and the gradient symbol (enter \ ( g r to get v).
The user-specified mark is the first argument.
.ML \(sq
Not all of the characters or symbols that you can use in t r o f f will have the same
effect in nroff.

r
Unlike bulleted and dashed lists, text is not automatically indented after a user-
specified mark. However, a space is added after the mark. The following example of
an indented paragraph and a list, which specifies a square as a mark, has been formatted
using n r o f f . The square appears as a pair of brackets.

[] Remove old initialization files.

[] Run install program.

[] Exit t o main menu and choose selection 3.

The user-supplied mark can be followed by a second argument that specifies a text
indent and a third argument of 1 to omit spacing between items.
The following example was produced using the list-initialization command:
.ML \ ( s q 5 1

I
The specified indent of 5 aligns the text with an indented paragraph:

Check t o see that you have completed the following


steps :

[I Remove old initialization files.


[I Run install program.
[I Exit t o main menu and choose selection 3.

Numbered and Alphabetic Lists


The .A L macro is used to initialize automatically numbered or alphabetized lists. The
syntax for this macro is:

.AL [type] [text indent] [l]

If no arguments are specified, the .A L macro produces a numbered list. For instance,
we can code the following paragraph with the list-initialization macro .AL:
0 ThemMacros 0 157

User-oriented documentation recognizes three things:


- AL
- LI
that a new user needs to learn the system in stages,
getting a sense of the system as a whole while becoming
proficient in performing particular tasks;
.LI
that there are different levels of users, and not every
user needs to learn all the capabilities of the system
in order to be productive;
.LI
that an experienced user must be able to rely on the
documentation for accurate and thorough reference
information.
.LE
to produce a numbered list:
I

User-oriented documentation recognizes three things:

1. that a new user needs to learn the system in stages,


getting a sense of the system as a whole while
becoming proficient in performing particular tasks;

2. that there are different levels of users, and not


every user needs to learn all the capabilities of
the system in order to be productive;

3. that an experienced user must be able t o rely on the


documentation for accurate and thorough reference
information.
The number is followed by a period, and two spaces are maintained between the period
and the first character of text.
The level of text indent, specified in the number register Li, is 6 in nrof f and
5 in t r o f f . This value is added to the current indent. If a text indent is specified,
that value is added to the current indent, but it does not change the value of Li.
The third argument inhibits spacing between items in the list. Additionally, the
number register LS can be set to a value from 0 to 6 indicating a nesting level. Lists
after this level will not have spacing between items. The default is 6, the maximum
nesting depth. If Ls were set to 2, lists only up to the second level would have a
blank line of space between items.
Other types of lists can be specified with .AL, using the first argument to
specify the list type, as follows:
158 UNIX Text Processing 0

Value Sequence Description


1 1, 2, 3 Numbered
A A, B, C Alphabetic (uppercase)
a a, b, c Alphabetic (lowercase)
I I, 11, I11 Roman numerals (uppercase)
i i, ii, iii Roman numerals (lowercase)

You can produce various list types by simply changing the type argument. You can
create a very useful outline format by nesting different types of lists. The example we
show of such an outline is one that is nested to four levels using I, A, 1 , and a, in
that order. The rather complicated looking input file is shown in Figure 6-4 (indented
for easier viewing of each list, although it could not be formatted this way), and the
n K O f f -formatted output is shown in Figure 6-5.
Another list-initialization macro that produces a numbered list is RL (reference
list). The only difference i s that the reference number is surrounded by brackets ([I).

.RL [text indent] [ 11

The arguments have the same effect as those specified with the .AL macro. To initial-
ize a reference list with no spacing between items, use:

.RL "" 1

It produces the following reference list:


I

[l] The Main Menu


[2] Menus or Commands?
[3] Error Handling
[4] Getting Help
[SI Escaping to UNIX

Variable-Item Lists
With a variable-item list, you do not supply a mark; instead, you specify a text label
with each .L I . One or more lines of text following .LI are used to form a block
paragraph indented from the label. If no label is specified, a paragraph with a hanging
indent is produced. The syntax is:

.VI, text indent [label indent] [ll


Unlike the other list-initialization macros, a text indent is required. By default, the label
is left justified, unless a label indent is given. If you specify both a text indent and a
label indent, the indent for the text will be added to the label indent.
~

ThemMacros 0 159

.AL I
- LI
Quick Tour of Alcuin
.AL A
.LI
Introduction to Calligraphy
- LI
Digest of Alcuin Commands
.AL 1
.LI
Three Methods of Command Entry
.AL a
- LI
Mouse
- LI
Keyboard
.LI
Light Pen
LE
.LI
Starting a Page
- LI
Drawing Characters
.AL a
.LI
Choosing a Font
.LI
Switching Fonts
.LE
.LI
Creating Figures
.LI
Printing
.LE
.LI
Sample Illuminated Manuscripts
.LE
.LI
Using Graphic Characters
.AL A
- LI
Modifying Font Style
.LI
Drawing Your Own Font
.LE
* LI

Library of Hand-Lettered Fonts


.LE

Fig. 6-4. Input for a Complex List


160 a UNlX Text Processing 0

- 1 -

I. Quick Tour of Alcuin

A. Introduction to Calligraphy

B. Digest of Alcuin Commands

1. Three Methods of Command Entry

a. Mouse

b. Keyboard

c. Light Pen

2. Starting a Page

3. Drawing Characters

a. Choosing a Font

b. Switching Fonts

4. Creating Figures

5. Printing

C. Sample Illuminated Manuscripts

11. Using Graphic Characters

A. Modifying Font Style

€3. Drawing Your Own Font

111. Library of Hand-Lettered Fonts

Fig. 6-5. Output of a Complex List


0 ThemmMacros 0 161

Variable-item lists are useful in preparing command reference pages, which


describe various syntax items, and glossaries, which present a term in one column and
its definition in the other. The text label should be a single word or phrase. The fol-
lowing example shows a portion of the input file for a reference page:
.VL 15 5
.LI figure
is the name of a cataloged figure. If
a figure has not been cataloged, you need to use
the LOCATE command.
.LI f:p
is the scale of the
figure in relation to the page.
.LI font
is the two-character abbreviation or
full name of one of the available fonts
from the Alcuin library.
- LE
The following variable-item list is produced:

figure is the name of a cataloged figure. If a


figure has not been cataloged, you need to
use the LOCATE command.

f :p is the scale of the figure in relation to


the page.

~ font is the two-character abbreviation or full


name of one of the available fonts from the
Alcuin library.
If you don't provide a text label with .LI or give a null argument you will
("'I),

get a paragraph with a hanging indent. If you want to print an item without a label,
specify a backslash followed by a space ( \ ) or \ 0 after .LI. Similarly, if you want
to specify a label that contains a space, you should also precede the space with a
backslash and enclose the label within quotation marks:
.LI "point\ size"
or simply substitute a \ 0 for a space:
. L I point\Osize

The first line of text is left justified (or indented by the amount specified in label
indent) and the remaining lines will be indented by the amount specified by text indent.
This produces a paragraph with a hanging indent:
162 0 UNlX Text Processing 0

.VL 15
-LI
There are currently 16 font dictionaries in t h e Alcuin
library. Any application may have u p t o 12 dictionaries
resident in memory at t h e same time.
.LE
When formatted, this item has a hanging indent of 15:

There are currently 16 font dictionaries in t h e Alcuin


library. Any application may have up t o
12 dictionaries resident in memory at t h e
same time .

Headings
Earlier we used the list macros to produce an indented outline. That outline, indented
to four levels, is a visual representation of the structure of a document. Headings per-
form a related function, showing how the document is organized into sections and sub-
sections. In technical documentation and book-length manuscripts, having a structure
that i s easily recognized by the reader is very important.

Numbered and Unnumbered Headings


Using mm, you can have up to seven levels of numbered and unnumbered headings,
with variable styles. There are two heading macros: .H for numbered headings and
.HU for unnumbered headings. A different style for each level of heading can be speci-
fied by setting various number registers and defining strings.
Let's first look at how to produce numbered headings. The syntax for the .H
macro is:

.H level [heading text] [heading sufJix3

The simplest use of the .H macro is to specify the level as a number between 1 and 7
followed by the text that is printed as a heading. If the heading text contains spaces,
you should enclose it within quotation marks. A heading that is longer than a single
line will be wrapped on to the next line. A multiline heading will be kept together in
case of a page break.
If you specify a heading suffx, this text or mark will appear in the heading but
will not be collected for a table of contents.
A top-level heading is indicated by an argument of 1:
.H 1 "Quick Tour of Alcuin"
ThemMacros 0 163

The result is a heading preceded by a heading-level number. The first-level heading has
the number 1.
1. Quick Tour of Alcuin

A second-level heading is indicated by an argument of 2:


.H 2 "Introduction t o Calligraphy"

The first second-level heading number is printed:


1.1 Introduction t o Calligraphy

When another heading is specified at the same level, the heading-level number is
automatically incremented. If the next heading is at the second level:
.H 2 "Digest of Alcuin Commands"

it produces:
1.2 Digest of Alcuin Commands
Each time you go to a new (higher-numbered) level, .1 is appended to the number
representing the existing level. That number i s incremented for each call at the same
level. When you back out of a level (for instance, from level 5 to 4), the counter for the
level (in this case level 5), is reset to 0.
An unnumbered heading is really a zero-level heading:
.H 0 "Introduction t o Calligraphy"

A separate macro, .HU, has been developed for unnumbered headings, although
its effect is the same.
.HU "Introduction t o Calligraphy"

Even though an unnumbered heading does not display a number, it increments the

r
counter for second-level headings. Thus, in the following example, the heading "Intro-
duction to Calligraphy" is unnumbered, but it has the same effect on the numbering
scheme as if it had been a second-level heading (1.1).

Quick Tour of Alcuin

Introduction to Calligraphy

1.2 Digest of Alcuin Commands

If you are going to intermix numbered and unnumbered headings, you can change
the number register Hu to the lowest-level heading that i s in the document. By chang-
ing H u from 2 to a higher number:
.nr Hu 5
.H 1 "Quick Tour of Alcuin"
.HU "Introduction t o Calligraphy"
.H 2 "Digest of Alcuin Commands"
164 0 UNlX Text Processing 0

rT
the numbering sequence is preserved for the numbered heading following an unnum-
bered heading:

Quick Tour of Alcuin


Introduction t o Calligraphy
Digest o f Alcuin Commands
Headings are meant to be visible keys to your document’s structure. If you are using
unnumbered headings, it becomes even more important to make headings stand out. A
simple thing you can do is use uppercase letters for a first-level heading.
Here is a list of some of the other things you can do to affect the appearance of
headings, although some of the items depend upon whether you are formatting with
n r o f f or t r o f f :

change to roman, italic, or bold font


change the point size of the heading
adjust spacing after the heading
center or left justify the heading
change the numbering scheme
select a different heading mark

The basic issue in designing a heading style is to help the reader distinguish between
different levels of headings. For instance, in an outline form, different levels of indent
show whether a topic is a section or subsection. Using numbered headings is an effec-
tive way to accomplish this. If you use unnumbered headings, you probably want to
vary the heading style for each level, although, for practical purposes, you should limit
yourself to two or three levels.
First, let’s look at what happens if we use the default heading style.
The first two levels of headings are set up to produce italicized text in t r o f f
and underlined text in n r o f f . After the heading, there is a blank line before the first
paragraph of text. In addition, a top-level heading has two blank lines before the head-
ing; all the other levels have a single line of space.

I---- 1.2 Introduction to Calligraphy

7
Alcuin revitalizes an age-old tradition. Calligraphy, quite simply, is the art of
beautiful handwriting.

Levels three through seven all have the same appearance. The text is italicized or
underlined and no line break occurs. Two blank lines are maintained before and after
the text of the heading. For example:
0 ThemmMacros 0 165

I 1
1.2.1.3 Light Pen The copyist’s pen and ink has been replaced by a light pen.
To change the normal appearance of headings in a document, you specify new
values for the two strings:

HF Heading font
HP Heading point size

You can specify individual settings for each level, up to seven values.
The font for each level of heading can be set by the string H F . The following
codes are used to select a font:

1 Roman
2 Italic
3 Bold

By default, the arguments for all seven levels are set to 2, resulting in italicized head-
ings in t ro f f and underlining in nrof f. Here the .HF string specifies bold for
the top three levels followed by two italic levels:
.ds HF 3 3 3 2 2

If you do not specify a level, it defaults to 1. Thus, in the previous example, level 6
and 7 headings would be printed in a roman font.
The point size is set by the string HP. Normally, headings are printed in the
same size as the body copy, except for bold headings. A bold heading is reduced by 1
point when it is a standalone heading, as are the top-level headings. The HP string can
take up to seven arguments, setting the point size for each level.
.ds HP 1 4 1 4 12

If an argument is not given, or a null value or 0 is given, the default setting of 10 points
is used for that level. Point size can also be given relative to the current point size:
.ds HP + 4 + 4 + 2

A group of number registers control other default formats of headings:

Ej Eject page
Hb Break follows heading
Hc Center headings
H i Align text after heading
Hs Vertical spacing after heading

For each of these number registers, you specify the number of the level at which some
action is to be turned on or off.
166 0 UNlX Text Processing 0

The E j register is set to the highest-level heading, usually 1 , that should start on
a new page. Its default setting is 0. This ensures that the major sections of a document
will begin on their own page.
.nr Ej 1
The Hb register determines if a line break occurs after the heading. The Hs register
determines if a blank line is output after the heading. Both are set to 2 by default. Set-
tings of 2 mean that, for levels 1 and 2, the section heading is printed, followed by a
line break and a blank line separating the heading from the first paragraph of text. For
lower-level headings (an argument greater than 2), the first paragraph follows irnmedi-
ately on the same line.
The H c register is set to the highest-level heading that you want centered. Nor-
mally, this is not used with numbered headings and its default value is 0. However,
unnumbered heads are often centered. A setting of 2 will center first- and second-level
headings:
.nr Hc 2

With unnumbered headings, you also have to keep in mind that the value of H c must be
greater than or equal to Hb and Hu. The heading must be on a line by itself; therefore
a break must be set in Hb for that level. The Hu register sets the level of an unnum-
bered heading to 2, requiring that Hc be at least 2 to have an effect on unnumbered
headings.
There really is no way, using these registers, to get the first and second levels left
justified and have the rest of the headings centered.
The number register H i determines the paragraph type for a heading that causes a
line break (Hb). It can be set to one of three values:

0 Left justified
1 Paragraph type determined by P t
2 Indented to align with first character in heading

If you want to improve the visibility of numbered headings, set Hi to 2:


.nr Hi 2
It produces the following results:

4.1 Generating Output

An Alcuin manuscript is a computer representation


that has t o be converted for output on various kinds
of devices, including plotters and laser printers.
0 ThemmMacros 0 167

Changing the Heading Mark


Remember how the list-initialization macro .A L allowed you to change the mark used
for a list, producing an alphabetic list instead of a numbered list? These same options
are available for headings using the .HM macro.
The .HM macro takes up to seven arguments specifying the mark for each level.
The following codes can be specified:

1 Arabic
001 Arabic with leading zeros
A Uppercase alphabetic
a Lowercase alphabetic
I Uppercase roman
i Lowercase roman

If no mark is specified, the default numbering system (arabic) is used. Uppercase


alphabetic marks can be used in putting together a series of appendices. You can
specify A for the top level:
.HM A
and retain the default section numbering for the rest of the headings. This could pro-
duce sections in the following series:

A , A.1, A.2, A.2.1, etc.

Marks can be mixed for an outline style similar to the one we produced using the list
macros:
.HM I A 1 a i

Roman numerals can be used to indicate sections or parts. If you specify:


.HM I i
the headings for the first two levels are marked by roman numerals. A third-level head-
ing is shown to demonstrate that the heading mark reverted to arabic by default:

I. Quick Tour of Alcuin

1.i Introduction t o Calligraphy

I.ii Digest of Alcuin Commands

I.ii.1 Three Methods of Command Entry


168 0 UNlX Text Processing 0

When you use marks consisting of roman numerals or alphabetic characters, you might
not want the mark of the current level to be concatenated to the mark of the previous
level. Concatenation can be suppressed by setting the number register H t to 1 :
.HM I i
.nr Ht 1
Now, each heading in the list has only the mark representing that level:

I. Quick Tour of Alcuin

i. Introduction to Calligraphy

ii. Digest of Alcuin Commands

1. Three Methods of Command E n t r y

Table of Contents
Getting a table of contents easily and automatically is almost reason enough to justify
all the energy, yours and the computer’s, that goes into text processing. You realize
that this is something that the computer was really meant to do.
When the table of contents page comes out of the printer, a writer attains a state
of happiness known only to a statistician who can give the computer a simple instruc-
tion to tabulate vast amounts of data and, in an instant, get a single piece of paper list-
ing the results.
The reason that producing a table of contents seems so easy is that most of the
work is performed in coding the document. That means entering codes to mark each
level of heading and all the figures, tables, exhibits, and equations. Processing a table
of contents is simply a matter of telling the formatter to collect the information that’s
already in the file.
There are only two simple codes to put in a file, one at the beginning and one at
the end, to generate a table of contents automatically.
At the beginning of the file, you have to set the number register C 1 to the level
of headings that you want collected for a table of contents. For example, setting C 1 to
2 saves first- and second-level headings.
Place the . T C macro at the end of the file. This macro actually does the pro-
cessing and formatting of the table of contents. The table of contents page is output at
the end of a document.
A sample table of contents page follows. The header “CONTENTS” is printed
at the top of the page. At the bottom of the page, lowercase roman numerals are used
as page numbers.
0 ThemMacros 0 169

r- CONTENTS

1 . Quick Tour of Alcuin .......................................................................... 1


1 . 1 Introduction to Calligraphy .......................................................... 3
1.2 Digest of Alcuin Commands ........................................................ 8
1.3 Sample Illuminated Manuscripts .................................................. 21

2. Using Graphic Characters ................................................................... 31


2.1 Scaling a Font ............................................................................... 33
2.2 Modifying Font Style ................................................................... 37
2.3 Drawing Your Own Font .............................................................. 41

I 3. Library of Hand-Lettered Fonts .......................................................... 51

- 1 -

One blank line is output before each first-level heading. AI1 first-level headings are left
justified. Lower-level headings are indented so that they line up with the start of text
for the previous level.
If you have included various displays in your document, and used the macros
.FG, .TB, and .EX to specify captions and headings for the displays, this informa-
tion is collected and output when the .TC macro is invoked. A separate page is
printed for each accumulated list of figures, tables, and exhibits. For example:

LIST OF TABLES

TABLE 1. List of Required Resources ..................... 7

TABLE 2. List of Available Resources .................... 16

If you want the lists of displays to be printed immediately following the table of con-
tents (no page breaks), you can set the number register C p to 1.
If you want to suppress the printing of individual lists, you can set the following
number registers to 0:

Lf If 0, no figures
Lt If 0, no tables
Lx If 0, no exhibits
170 0 UNlX Text Processing 0

In addition, there is a number register for equations that is set to 0 by default. If you
want equations marked by .E C to be listed, specify:
.nr Le 1

There are a set of strings, using the same names as the number registers, that define the
titles used for the top of the lists:

Lf LIST OF FIGURES
Lt LIST OF TABLES
Lx LIST OF EXHIBITS
Le LIST OF EQUATIONS

You can redefine a string using the . d s (define string) request. For instance, we can
redefine the title for figures as follows:
.ds Lf LIST OF ALCUIN DRAWINGS

. Footnotes and References


Footnotes and references present special problems, as anyone who has ever typed a term
paper knows. Fortunately, mm has two pairs of specialized macros. Both of them fol-
low a marker in the text and cause lines of delimited text to be saved and output either
at the bottom of the page, as a footnote, or at end of the document, as a reference.

Footnotes
A footnote is marked in the body of a document by the string \*F. It follows immedi-
ately after the text (no spaces).
in an article on desktop publishing.\*F

The string F supplies the number for the footnote. It is printed (using t r o f f) as a
superscript in the text and its value is incremented with each use.
The .F S macro indicates the start, and .F E the end, of the text for the footnote.
These macros surround the footnote text that will appear at the bottom of the page. The
.FS macro is put on the line immediately following the marker.
.FS
"Publish or Perish: Start-up grabs early page language lead,"
\fIComputerworld\fR, April 21, 1986, p - 1.
-FE

You can use labels instead of numbers to mark footnotes. The label must be specified
as a mark in the text and as an argument with .FS.
...in accord with t h e internal specs.[APS]
.FS [ A P S ]
"Alcuin Product Specification," March 1986
.F E
0 ThemmMacros 0 171

You can use both numbered and labeled footnotes in the same document. All the foot-
notes are collected and output at the bottom of each page underneath a short line rule.
If you are using t r o f f, the footnote text will be set in a type size 2 points less than
the body copy.
If you want to change the standard format of footnotes, you can specify the F D .
macro. It controls hyphenation, text adjustment, indentation, and justification of the
label.
Normally, the text of a footnote i s indented from the left margin and the mark or
label i s left justified in relation to the start of the text. It is possible that a long footnote
could run over to the next page. Hyphenation is turned off so that a word will not be
broken at a page break. These specifications can be changed by giving a value between
.
0 and 11 as the first argument with F D , as shown in Table 6-3.

TABLE 6-3. .FD Argument Values

Text Label
Argument Hyphenation Adjust Indent Justification
0 no Yes Yes left
1 Yes Yes Yes left
2 no no Yes left
3 Yes no Yes left
4 no Yes no left
5 Yes Yes no left
6 no no no left
7 Yes no no left
8 no Yes Yes right
9 Yes Yes Yes right
10 no no Yes right
11 Yes no Yes right

.
The second argument for F D , if 1, resets the footnote numbering counter to 1.
This can be invoked at the end of a section or paragraph to initiate a new numbering
sequence. If specified by itself, the first argument must be null:
.FD I'" 1

References
A reference differs from a footnote in that all references are collected and printed on a
single page at the end of the document. In addition, you can label a reference so that
you can refer to it later.
172 UNlX Text Processing 0

A reference is marked where it occurs in the text with \ * ( R f . The formatter


converts the string into a value printed in brackets, such as [l]. The mark is followed
by a pair of macros surrounding the reference text. The .RS macro indicates the start,
and .RF the end, of the text for the reference.
You will find information on this page description language
in their reference manual, which has been published
as a book.\*(Rf
.RS
Adobe Systems, Inc. PostScript Reference Manual.
Reading, Massachusetts: Addison-Wesley; 1985.
.RF

You can also give as a string label argument to .RS the name of a string that will be
assigned the current reference number. This string can be referenced later in the docu-
ment. For instance, if we had specified a string label in the previous example:
.RS A s
We could refer back to the first reference in another place:
The output itself is a readable file which you can interpret
with the aid o f t h e PostScript manual.\*(As

At the end of the document, a reference page is printed. The title printed on the
reference page is defined in the string Rp. You can replace “REFERENCES” with
another title simply by redefining this string with - ds.

REFERENCES

1. Adobe Systems, Inc.; PostScript Reference Manual.


Reading, Massachusetts: Addison-Wesley; 1985.

In a large document, you might want to print a list of references at the end of a chapter
or a long section. You can invoke the .RP macro anywhere in a document.
- RP
.H 1 “Detailed Outline of User Guide”
It will print the list of references on a separate page and reset the reference counter to 0.
A reset argument and a paging argument can be supplied to change these actions. The
reset argument i s the first value specified with the .RP macro. It i s normally 0, reset-
ting the reference counter to 1 so that each section is numbered independently. If refer-
ence numbering should be maintained in sequence for the entire document, specify a
value of 1 .
0 ThemmMacros 0 173

The paging argument is the second value specified. It controls whether or not a
page break occurs before and after the list. It is normally set to 0, putting the list on a
new page. Specifying a value of 3 suppresses the page break before and after the list;
the result is that the list of references is printed following the end of the section and the
next section begins immediately after the list. A value of 1 will suppress only the page
break that occurs after the list and a value of 2 will suppress only the page break that
occurs before the list.
If you want an effect opposite that of the default settings, specify:
.RE’ 1 3

The first argument of 1 saves the current reference number for use in the next section or
chapter. The second argument of 3 inhibits page breaks before and after the list of
references.

Extensions to mm
So far, we have covered most but not all of the features of the mm macro package.
We have not covered the Technical Memorandum macros, a set of specialized
macros for formatting technical memos and reports. L i e the ones in the m s macro
package, these macros were designed for internal use at AT&T’s Bell Laboratories,
reflecting a company-wide set of standards. Anyone outside of Bell Labs will want to
make some modifications to the macros before using them. The Technical Memoran-
dum macros are a good example of employing a limited set of user macros to produce a
standard format. Seeing how they work will be especially important to those who are
responsible for implementing documentation standards for a group of people, some of
whom understand the basics of formatting and some of whom do not.
Writing or rewriting macros i s only one part of the process of customizing mm.
The mm macros were designed as a comprehensive formatting system. As we’ve seen,
there are even macros to replace common primitive requests, like .sp. The develop-
ers of mm recommend, in fact, that you not use n r o f f or t r o f f requests unless
absolutely necessary, lest you interfere with the action of the macros.
Furthermore, as you will see if you print out the mm macros, the internal code of
mm is extraordinarily dense, and uses extremely un-mnemonic register names. This
makes it very difficult for all but the most experienced user to modify the basic struc-
ture of the package. You can always add your own macros, as long as they don’t con-
flict with existing macro and number register names, but you can’t easily go‘in and
change the basic macros that make up the mm package.
At the same time, the developers of mm have made it possible for the user to
make selective modifications-those which mm has allowed mechanisms for in
advance. There are two such mechanisms:

mm’s use of number registers to control all aspects of document formatting


mm’s invocation of undefined (and therefore user-definable) macros at various
places in the mrn code
174 UNlX Text Processing 0

The mm package is very heavily parameterized. Almost every feature of the formatting
system-from the fonts in which different levels of heading are printed to the size of
indents and the amount of space above and below displays-is controlled by values in
number registers. By learning and modifying these number registers, you can make sig-
nificant changes to the overall appearance of your documents.
In addition, there are a number of values stored in strings. These strings are used
like number registers to supply default values to various macros.
The registers you are most likely to want to change follow. Registers marked
with a dagger can only be changed on the comand line with the -r option (e.g.,
-rN4).

c1 Level of headings saved for table of contents. See .TC macro.


Default is 2.
CP If set to 1, lists of figures and tables appear on same page as table of
contents. Otherwise, they start on a new page. Default is 1 .
Ds Sets the pre- and post-space used for static displays.
Fs Vertical spacing between footnotes.
Hb Level of heading for which break occurs before output of body text.
Default is 2 lines.
Hc Level of heading for which centering occurs. Default is 0.
Hi Indent type after heading. Default is 1 (paragraph indent). Legal
values are: O=left justified (default); 1 =indented; 2=indented except
after .H, .LC, .DE.
Hs Level of heading for which space after heading occurs. Default is 2,
i.e., space will occur after first- and second-level headings.
HY Sets hyphenation. If set to 1 , enables hyphenation. Default is 0.
L? Sets length of page. Default is 66v.
L i Default indent of lists. Default is 5.
Ls List spacing between items by level. Default is 6, which is spacing
between all levels of list.
Nt Page numbering style. O=all pages get header; l=header printed as
footer on page 1; 2=no header on page 1 ; 3=section page as footer;
4=no header unless .PH defined; 5=section page and section figure
as footer. Default is 0.
NP Numbering style for paragraphs. O=unnumbered; l=numbered.
0 Offset of page. For nroff, this value is an unscaled number
representing character positions. (Default is 9 characters; about -75.)
For t r o f f , this value is scaled (Si).
1
0 ThemmMacros 0 175

Of Figure caption style. O=period separator; l=hyphen separator.


Default is 0.
Pi Amount of indent for paragraph. Default is 5 for nrof f, 3n for
troff.
Ps Amount of spacing between paragraphs. Default is 3v.
Pt Paragraph type. Default is 0.
S? Default point size for trof f. Default is 10. Vertical spacing i s
\nS+2.

Si Standard indent for displays. Default is 5 for n r o f f, 3 for t r o f f .


w Width of page (line and title length). Default i s 6 in t r o f f , 60
characters in n r o f f.

There are also some values that you would expect to be kept in number registers
that are actually kept in strings:

HF Fonts used for each level of heading (l=roman, 2=italic, 3=bold)


HP Point size used for each level of heading

For example, placing the foIlowing register settings at the start of your document:
.nr Hc 1
.nr Hs 3
.nr Hb 4
.nr Hi 2
.ds HF 3 3 3 3 2 2 2
.ds HP 1 6 1 4 1 2 1 0 1 0 10 1 0

will have the following effects:

Top-level headings (generated by .H1) will be centered.


The first three levels of heading will be followed by a blank line.
The fourth-level heading will be followed by a break.
Fifth- through seventh-level headings will be run-in with the text.
All headings will have the following text indented under the first word of the
heading, so that the section number hangs in the margin.
The first five levels of heading will be in bold type; the sixth and seventh will
be italic.
A first-level heading will be printed in 16-point type; a second-level heading in
14-point type; a third-level heading in 12-point type; and all subsequent levels
in 10-point type.
176 0 UNlX Text Processing 0

There isn?t space in this book for a comprehensive discussion of this topic. However, a
complete list of user-settable mm number registers is given in Appendix B. Study this
list, along with the discussion of the relevant macros, and you will begin to get a picture
of just how many facets of mm you can modify by changing the values in number
registers and strings.
The second feature-the provision of so-called ?user exit macros? at various
points-is almost as ingenious. The following macros are available for user definition:
.HX .HY .HZ .PX .TX .TY

The .HX, .HY, and .HZ macros are associated with headings. The .HX macro is
executed at the start of each heading macro, . H Y in the middle (to allow you to
respecify any settings, such as temporary indents, that were lost because of nun?s own
processing), and .HZ at the end.
By default, these macros are undefined. And, when t r o f f encounters an unde-
fined macro name, it simply ignores it. These macros thus lie hidden in the code until
you define them. By defining these macros, you can supplement the processing of
headings without actually modifying the mm code. Before you define these macros, be
sure to study the nun documentation for details of how to use them.
Similarly, .P X is executed at the top of each page, just after .PH. Accordingly,
it allows you to perform additional top-of-page processing. (In addition, you can rede-
fine the - TP macro, which prints the standard header, because this macro is relatively
self-contained.)
There is a slightly different mechanism for generalized bottom-of-page processing.
The .B S / . B E macro pair can be used to enclose text that will be printed at the bot-
tom of each page, after any footnotes but before the footer. To remove this text after
you have defined it, simply specify an empty block.
The .VM (verticd margins) macro allows you to specify additional space at the
top of the page, bottom of the page, or both. For example:
.VM 3 3
will add three lines each to the top and bottom margins. The arguments to this macro
should be unscaled. The first argument applies to the top margin, the second to the bot-
tom.
The .T X and .T Y macros allow you to control the appearance of the table of
contents pages. The .TX macro i s executed at the top of the first page of the table of
contents, above the title; .TY is executed in place of the standard title (?CON-
TENTS?).
In Chapter 14, you will learn about writing macro definitions, which should give
you the information you need to write these supplementary ?user exit macros.?
I
A P E
C H T R
.

Advanced Editing

Sometimes, in order to advance, you have to go backward. In this chapter, we are


going to demonstrate how you can improve your text-editing skills by understanding
how line editors work. This doesn’t mean you’ll have to abandon full-screen editing.
The v i editor was constructed on top of a line editor named ex, which was an
improved version of another line editor named ed. So in one sense we’ll be looking at
the ancestors of v i . We’ll look at many of the ways line editors attack certain prob-
lems and how that applies to those of us who use full-screen editors.
Line editors came into existence for use on “paper terminals,” which were basi-
cally printers. This was before the time of video display terminals. A programmer, or
some other person of great patience, worked somewhat interactively on a printer. Typi-
cally, you saw a line of your file by printing it out on paper; you entered commands
that would affect just that line; then you printed out the edited line again. Line editors
were designed for this kind of process, editing one line at a time.
People rarely edit files on paper terminals any more, but there are diehards who
still prefer line editors. For one thing, it imposes less of a burden on the computer.
Line editors display the current line; they don’t update the entire screen.
On some occasions, a line editor is simpler and faster than a full-screen editor.
Sometimes, a system’s response can be so slow that it is less frustrating to work if you
switch to a line editor. Or you may have occasion to work remotely over a dial-up line
operating at a baud rate that is too slow to work productively with a full-screen editor.
In these situations, a line editor can be a way to improve your efficiency. It can reduce
the amount of time you are waiting for the computer to respond to your commands.
The truth is, however, that after you switch from a screen editor to a line editor,
you are likely to feel deprived. But you shouldn’t skip this chapter just because you
won’t be using a line editor. The purpose of learning e x is to extend what you can do
in v i .
178 UNlX Text Processing 0

The e x Editor
The e x editor is a line editor with its own complete set of editing commands.
Although it is simpler to make most edits with v i , the line orientation of e x is an
advantage when you are making large-scale changes to more than one part of a file.
With e x , you can move easily between files and transfer text from one file to another
in a variety of ways. You can search and replace text on a line-by-line basis, or glo-
bally. You can also save a series of editing commands as a macro and access them with
a single keystroke.
Seeing how e x works when it is invoked directly will help take some of the
“mystery” out of line editors and make it more apparent to you how many e x com-
mands work.
Let’s open a file and try a few e x commands. After you invoke e x on a file,
you will see a message about the total number of lines in the file, and a colon command
prompt. For example:
$ ex intro
“intro” 20 lines, 731 characters

You won’t see any lines in the file, unless you give an e x command that causes one or
more lines to be printed.
All e x commands consist of a line address, which can simply be a line number,
and a command. You complete the command with a carriage return. A line number by
itself is equivalent to a print command for that line. So, for example, if you type the
numeral 1 at the prompt, you will see the first line of the file:
:1
Sometimes, t o advance,

To print more than one line, you can specify a range of lines. Two line numbers are
specified, separated by commas, with no spaces in between them:
:1,3
Sometimes, t o advance,
y o u have t o g o backward.
Alcuin i s a computer graphics tool

The current line is the last line affected by a command. For instance, before we issued
the command 1 , 3, line 1 was the current line; after that command, line 3 became the
current line. It can be represented by a special symbol, a dot (.).
: .,+3
that lets you design and create hand-lettered, illuminated
manuscripts, such as were created in t h e Middle Ages.

The previous command results in three more lines being printed, starting with the
current line. A + or - specifies a positive or negative offset from the current line.
0 Advanced Editing 0 179

The e x editor has a command mode and an insert mode. To put text in a file,
you can enter the append or a command to place text on the line following the
current line. The i n s e r t or i command places text on the line above the current
line. Type in your text and when you are finished, enter a dot ( - ) on a line by itself:
:a
Monks, skilled i n calligraphy,
labored t o make copies of ancient
documents and preserve in a
library the works of many Greek and
Roman authors.

Entering the dot takes you out of insert mode and puts you back in command mode.
A line editor does not have a cursor, and you cannot move along a line of text to
a particular word. Apart from not seeing more of your file, the lack of a cursor (and
therefore cursor motion keys) is probably the most difficult thing to get used to. After
using a line editor, you long to get back to using the cw command in v i .
If you want to change a word, you have to move to the line that contains the
word, tell the editor which word on the line you want to change, and then provide its
replacement. You have to think this way to use the s u b s t i t u t e or s command.
It allows you to substitute one word for another.
We can change the last word on the first line from tool to environment:
:1
Alcuin is a computer graphics tool
:s/tool/environment/
Alcuin is a computer graphics environment

The word you want to change and its replacement are separated by slashes (/). As a
result of the substitute command, the line you changed is printed.
With a line editor, the commands that you enter affect the current line. Thus, we
made sure that the first line was our current line. We could also make the same change
by specifying the line number with the command:
:ls/environment/tool/
Alcuin is a computer graphics tool

If you specify an address, such as a range of line numbers, then the command will
affect the lines that you specify:
:1,20s/Alcuin/ALCUIN/
ALCUIN is named after an English scholar

The last line on which a substitution was made is printed.


Remember, when using a line editor, you have to tell the editor which line (or
lines) to work on as well as which command to execute.
180 0 UNIX Text Processing 0

Another reason that knowing ex is useful is that sometimes when you are work-
ing in vi, you might unexpectedly find yourself using “open mode.” For instance, if
you press Q while in vi, you will be dropped into the ex editor. You can switch to
vi by entering the command vi at the colon prompt:
:vi

After you are in vi, you can execute any ex command by first typing a :
(colon). The colon appears on the bottom of the screen and what you type will be
echoed there. Enter an ex command and press RETURN to execute it.

= Using ex Commands in v i
Many ex commands that perform normal editing operations have equivalent vi com-
mands that do the job in a simpler manner. Obviously, you will use d w or dd to
delete a single word or line rather than using the delete command in ex. How-
ever, when you want to make changes that affect numerous lines, you will find that the
ex commands are very useful. They allow you to modify large blocks of text with a
single command.
Some of these commands and their abbreviations follow. You can use the full
command name or the abbreviation. whichever is easier to remember.

delete d Delete lines


move m Move lines
COPY co Copy lines
substitute s Substitute one string for another

The substitute command best exemplifies the ex editor’s ability to make editing easier.
It gives you the ability to change any string of text every place it occurs in the file. To
perform edits on a global replacement basis requires a good deal of confidence in, as
well as full knowledge of, the use of pattern matching or “regular expressions.”
Although somewhat arcane, learning to do global replacements can be one of the most
rewarding experiences of working in the UNIX text-processing environment.
Other e x commands give you additional editing capabilities. For all practical
purposes, they can be seen as an integrated part of vi. Examples of these capabilities
are the commands for editing multiple files and executing UNIX commands. We will
look at these after we look at pattern-matching and global replacements.

Write Locally, Edit Globally


Sometimes, halfway through a document or at the end of a draft, you recognize incon-
sistencies in the way that you refer to certain things. Or, in a manual, some product
that you called by name is suddenly renamed (marketing!). Often enough, you have to
go back and change what you’ve already written in several places.
0 Advanced Editing 181

The way to make these changes is with the search and replace commands in ex.
You can automatically replace a word (or string of characters) wherever it occurs in the
file. You have already seen one example of this use of the substitute command, when
we replaced Alcuin with ALCUIN:
:1,20s/Alcuin/ALCUIN/

There are really two steps in using a search and replace command. The first step is to
define the area in which a search will take place. The search can be specified locally to
cover a block of text or globally to cover the entire file. The second step is to specify,
using the substitute command, the text that will be removed and the text that will
replace it.
At first, the syntax for specifying a search and replace command may strike you
as difficult to learn, especially when we introduce pattern matching. Try to keep in
mind that this is a very powerful tool, one that can save you a lot of drudgery. Besides,
you will congratulate yourself when you succeed, and everyone else will think you are
very clever.

Searching Text Blocks


To define a search area, you need to be more familiar with how line addressing works
in ex. A line address simply indicates which line or range of lines an e x command
will operate on. If you don't specify a line address, the command only affects the
current line. You already know that you can indicate any individual line by specifying
its number. What we want to look at now are the various ways of indicating a block of
text in a file.
You can use absolute or relative line numbers to define a range of lines. Identify
the line number of the start of a block of text and the line number of the end of the
block. In v i , you can use " G to find the current line number.
There are also special symbols for addressing particular places in the file:

- Current line
$ Last line
9.
0 All lines (same as 1 , $)

The following are examples that define the block of text that the substitute command
will act upon:

:. , $ s Search from the current line to the end of the file


: 2 0 , .s Search from line 20 through the current line
: . , .+ 2 0 s Search from the current line through the next 20 lines
:1 0 0 , $ s Search from line 1 0 0 through the end of the file
:% S Search all lines in the file

Within the search area, as defined in these examples, the substitute command will look
for one string of text and replace it with another string.
182 0 UNlX Text Processing 0

You can also use pattern matching to specify a place in the text. A pattern -is del-
imited by a slash both before and after it.

lpatternIl,lpattern2 / s Search from the first line containing pattern1 through the
first line containing pattern2
:.,lpattern / s Search from the current line through the line containing
pattern

It is important to note that the action takes place on the entire line containing the pat-
tern, not simply the text up to the pattern.

Search and Replace


You’ve already seen the substitute command used to replace one string with another
one. A slash is used as a delimiter separating the old string and the new. By prefixing
the s command with an address, you can extend its range beyond a single line:
:1,20s/Alcuin/ALCUIN/

Combined with a line address, this command searches all the lines within the block of
text. But it only replaces the first occurrence of the pattern on each line. For instance,
if we specified a substitute command replacing roman with Roman in the following
line:
after t h e roman hand. In teaching t h e roman script

only the first, not the second, occurrence of the word would be changed.
To specify each occurrence on the line, you have to add a g at the end of the
command:
:s/roman/Roman/g

This command changes every occurrence of roman to Roman on the current line.
Using search and replace is much faster than finding each instance of a string and
replacing it individually. It has many applications, especially if you are a poor speller.
So far, we have replaced one word with another word. Usually, it’s not that easy.
A word may have a prefix or suffix that throws things off. In a while, we will look at
pattern matching. This will really expand what you are able to do. But first, we want
to look at how to specify that a search and replace take place globally in a file.

Confirming Substitutions
It is understandable if you are over-careful when using a search and replace command.
It does happen that what you get is not what you expected. You can undo any search
and replacement command by entering u. But you don’t always catch undesired
changes until it is too late to undo them. Another way to protect your edited file is to
save the file with :w before performing a replacement. Then, at least you can quit the
file without saving your edits and go back to where you were before the change was
made. You can also use :e ! to read in the previous version of the buffer.
0 Advanced Editing 183

It may be best to be cautious and know exactly what is going to be changed in


your file. If you’d like to see what the search turns up and confirm each replacement
before it is made, add a c at the end of the substitute command:
:1,30s/his/the/gc

It will display the entire line where the string has been located and the string itself will
be marked by a series of carets (AAA).
copyists at his school
.-.An

If you want to make the replacement, you must enter y and press RETURN.
If you don’t want to make a change, simply press RETURN.
this can b e used for invitations, signs, and menus.
h h h

The combination of the v i commands // (repeat last search) and . (repeat last com-
mand) is also an extraordinarily useful (and quick) way to page through a file and make
repetitive changes that require a judgment call rather than an absolute global replace-
ment.

Global Search and Replace


When we looked at line addressing symbols, the percent symbol, %, was introduced. If
you specify it with the substitute command, the search and replace command will affect
all lines in the file:
:%s/Alcuin/ALCUIN/g

This command searches all lines and replaces each occurrence on a line.
There is another way to do this, which is slightly more complex but has other
benefits. The pattern is specified as part of the address, preceded by a g indicating that
the search is global:
:g/Alcuin/s//ALCUIN/g

It selects all lines containing the pattern Alcuin and replaces every occurrence of that
pattern with ALCUIN. Because the search pattern is the same as the word you want to
change, you don’t have to repeat it in the substitute command.
The extra benefit that this gives is the ability to search for a pattern and then
make a different substitution. We call this context-sensitive replacement.
The gist of this command is globally search for a pattern:
:g /pattern/

Replace it:
:g/pattern/ s / /

or replace another string on that Iine:


184 0 UNlX Text Processing 0

:g/pattern/ s /string/

with a new string:


:g/pattern/ s/string/new/

and do this for every occurrence on the line:


:g/pattern/ s /string/new/g

For example, we use the macro .BX to draw a box around the name of a special key.
To show an ESCAPE key in a manual, we enter:
.BX E s c

Suppose we had to change Esc to ESC, but we didn’t want to change any references to
Escape in the text. W e could use the following command to make the change:
:g/BX/s/Esc/ESC/

This command might be phrased: “Globally search for each instance of B X and on
those lines substitute the Esc with ESC”. We didn’t specify g at the end of the
command because we would not expect more than one occurrence per line.
Actually, after you get used to this syntax, and admit that it is a little awkward,
you may begin to like it.

Pattern Matching
If you are familiar with grep, then you know something about regular expressions. In
making global replacements, you can search not just for fixed strings of characters, but
also for patterns of words, referred to as regular expressions.
When you specify a literal string of characters, the search might turn up other
occurrences that you didn’t want to match. The problem with searching for words in a
file is that a word can be used in many different ways. Regular expressions help you
conduct a search for words in context.
Regular expressions are made up by combining normal characters with a number
of special characters. The special characters and their use follow.*

Matches any single character except newline.


* Matches any number (including 0) of the single character (including a
character specified by a regular expression) that immediately precedes
it. For example, because . (dot) means any character, - * means
match any number of any character.

*\( and\), and \{n,rn\] are not supported in all versions of v i . \<, \>, \u,\u,U, a n d k are supported
only in v i / e x , and not in other programs using regular expressions.
0 AdvancedEditing 0 185

Matches any one of the characters enclosed between the brackets.


For example, [AB]matches either A or B. A range of consecutive
characters can be specified by separating the first and last characters
in the range with a hyphen. For example, [A-21 will match any
uppercase letter from A to Z and [ 0-91 will match any digit from 0
to 9.
Matches a range of occurrences of the single character (including a
character specified by a regular expression) that immediately precedes
it. The n and m are integers between 0 and 256 that specify how
many occurrences to match. \(n\} will match exactly n occurrences,
\{ n,\] will match at least n occurrences, and \( n,m\} will match any
number of occurrences between n and rn. For example, A \ { 2 , 3 \ }
will match either AA (as in AARDVARK or AAA but will not match
the single letter A).
Requires that the following regular expression be found at the begin-
ning of the line.
Requires that the preceding regular expression be found at the end of
the line.
Treats the following special character as an ordinary character. For
example, \ . stands for a period and \* for an asterisk.
Saves the pattern enclosed between \( and \) in a special holding
space. Up to nine patterns can be saved in this way on a single line.
They can be “replayed” in substitutions by the escape sequences \ 1
to \ 9 .
Matches the nth pattern previously saved by \( and \), where n is a
number from 0 to 9 and previously saved patterns are counted from
the left on the line.
Matches characters at the beginning (\<) or at the end ( \ > ) of a
word. The expression \<ac would only match words that begin
with ac, such as action but not react.
Prints the entire search pattern when used in a replacement string.
Converts the first character of the replacement string to uppercase.
Converts the replacement string to uppercase as in :/Unix/ \U& /.
Converts the first character of the replacement string to lowercase, as
in : s / Act/\l&/.
Converts the replacement string to lowercase.

Unless you are already familiar with UNIX’s wildcard characters, this list of special
characters probably looks complex. A few examples should make things clearer. In the
examples that follow, a square (0)is used to mark a blank space.
186 0 UNlX Text Processing 0

Let’s follow how you might use some special characters in a replacement. Sup-
pose you have a long file and you want to substitute the word balls for the word ball
throughout that file. You first save the edited buffer with :w, then try the global
replacement:
:g/ball/s//balls/g

When you continue editing, you notice occurrences of words such as ballsoon, glo-
ballsy, and ballss. Returning to the last saved buffer with :e ! , you now try specifying
a space after bull to limit the search:
:g/ba110/s//ballsO/g
But this command misses the occurrences ball., ball,,ball:, and so on.
:g/\<ball\>/s//balls/g

By surrounding the search pattern with \ < and \>, we specify that the pattern should
only match entire words, with or without a subsequent punctuation mark. Thus, it does
not match the word balls if it already exists.
Because the \ < and \ > are only available in ex (and thus v i ) , you may have
occasions to use a longer form:
:g/ball\ ( [a, .; : !7 1 \ ) /s//balls\l/g
This searches for and replaces ball followed by either a space (indicated by n) or any
one of the punctuation characters ,.;:! ?. Additionally, the character that is matched
is saved using \ ( and \ ) and restored on the right-hand side with \ 1. The syntax
may seem complicated, but this command sequence can save you a lot of work in a
similar replacement situation.

Search for General Classes of Words


The special character & is used in the replacement portion of a substitution command
to represent the pattern that was matched. It can be useful in searching for and chang-
ing similar but different words and phrases.
For instance, a manufacturer decides to make a minor change to the names of
their computer models, necessitating a change in a marketing brochure. The HX5000
model has been renamed the Series HX.5000,along with the HX6000 and HX8500
models. Here’s a way to do this using the & character:
:g/HX[568][05]00/s//Series & / g
This changes HX8500 to Series HX8500. The & character is useful when you want to
replay the entire search pattern and add to it. If you want to capture only part of the
search pattern, you must use \ ( and \ ) and replay the saved pattern with
\ l .. . \n.)
For instance, the same computer manufacturer decides to drop the HX from the
model numbers and place Series after that number. We could make the change using
the following command:
0 AdvancedEditing 187

: g / \ (Series\) HX\ ([568]) [05]O O \ ) / s / / \ 2 \l/g


This command replaces Series HX8500 with 8500 Series.
Suppose you have subroutine names beginning with the prefixes mgi, mgr,and
mga.

mgibox routine
mgrbox routine
mgabox routine

If you want to save the prefixes, but want to change the name box to square, either of
the following replacement commands will do the trick:
:g/mg([iar])box/s//mg\lsquare/

The global replacement keeps track of whether an i, a,or r is saved, so that only
box is changed to square. This has the same effect as the previous command:
:g/mg[iar]box/s/box/square/g
The result is:

mgisquare routine
mgrsquare routine
mgasquare routine

Block Move by Patterns


You can edit blocks of text delimited by patterns. For example, assume you have a 150
page reference manual. All references pages are organized in the same way: a para-
graph with the heading SYNTAX, followed by DESCRIPTION, followed by PARAME-
TERS. A sample of one reference page follows:
.Rh 0 "Get status of named file" "STAT"
.Rh "SYNTAX"
.nf
integer*4 stat, retval
integer*4 status (11)
character*123 filename
...
retval = stat (filename, status)
.fi
.Rh "DESCRIPTION"
Writes the fields of a system data structure into the
status array. These fields contain (among other
things) information about the file's location, access
privileges, owner, and time of last modification.
.Rh "PARAMETERS"
.IP "filename" 15n
188 0 UNIX Text Processing 0

A character string variable or constant containing


the U N I X pathname f o r the file whose status you want