0% found this document useful (0 votes)
104 views999 pages

S Programmation

Uploaded by

Hubert Boulic
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
104 views999 pages

S Programmation

Uploaded by

Hubert Boulic
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 999

S-PLUS 6 for Windows

Programmer’s Guide

July 2001

Insightful Corporation
Seattle, Washington
Proprietary Insightful Corporation owns both this software program and its
Notice documentation. Both the program and documentation are
copyrighted with all rights reserved by Insightful Corporation.
The correct bibliographical reference for this document is as follows:
S-PLUS 6 for Windows Programmer’s Guide, Insightful Corporation,
Seattle, WA.
Printed in the United States.

Copyright Notice Copyright © 1988-2001, Insightful Corporation. All rights reserved.


Insightful Corporation
1700 Westlake Avenue N, Suite 500
Seattle, Washington 98109
USA

ii
ACKNOWLEDGMENTS

S-PLUS would not exist without the pioneering research of the Bell
Labs S team at AT&T (now Lucent Technologies): John Chambers,
Richard A. Becker (now at AT&T Laboratories), Allan R. Wilks (now
at AT&T Laboratories), Duncan Temple Lang, and their colleagues in
the statistics research departments at Lucent: William S. Cleveland,
Trevor Hastie (now at Stanford University), Linda Clark, Anne
Freeny, Eric Grosse, David James, José Pinheiro, Daryl Pregibon, and
Ming Shyu.
Insightful Corporation thanks the following individuals for their
contributions to this and earlier releases of S-PLUS: Douglas M. Bates,
Leo Breiman, Dan Carr, Steve Dubnoff, Don Edwards, Jerome
Friedman, Kevin Goodman, Perry Haaland, David Hardesty, Frank
Harrell, Richard Heiberger, Mia Hubert, Richard Jones, Jennifer
Lasecki, W.Q. Meeker, Adrian Raftery, Brian Ripley, Peter
Rousseeuw, J.D. Spurrier, Anja Struyf, Terry Therneau, Rob
Tibshirani, Katrien Van Driessen, William Venables, and Judy Zeh.

iii
iv
CONTENTS OVERVIEW

CONTENTS OVERVIEW

S-PLUS Language Programming

Chapter 1 The S-PLUS Language 1


Chapter 2 Data Objects 21
Chapter 3 Data Frames 43
Chapter 4 Writing Functions in S-PLUS 71
Chapter 5 Importing and Exporting 171
Chapter 6 Debugging Your Functions 205

Graphics

Chapter 7 Editable Graphics Commands 237


Chapter 8 Traditional Graphics 297
Chapter 9 Traditional Trellis Graphics 385

Advanced S-PLUS Programming

Chapter 10 Object-Oriented Programming in S-PLUS 479


Chapter 11 Programming the User Interface Using S-PLUS 501
Chapter 12 Customized Analytics: A Detailed Example 539

v
CONTENTS OVERIVEW

Connectivity and Interfaces

Chapter 13 Automation 583


Chapter 14 Calling S-PLUS Using DDE 625
Chapter 15 Interfacing With C and Fortran Code 635
Chapter 16 Using CONNECT/C++ 691

Advanced Topics

Chapter 17 Extending the User Interface 717


Chapter 18 The S-PLUS Command Line and
the System Interface 833
Chapter 19 Computing on the Language 863
Chapter 20 Data Management 883
Chapter 21 Using Less Time and Memory 907
Chapter 22 Simulations in S-PLUS 929
Chapter 23 Evaluation of Expressions 943
Chapter 24 The Validation Suite 957

Index 969

vi
CONTENTS

Chapter 1 The S-PLUS Language 1


Introduction to S-PLUS 2
Syntax of S-PLUS Expressions 7
Data Classes 11
The S-PLUS Programming Environment 14
Graphics Paradigms 17

Chapter 2 Data Objects 21


Introduction 22
Vectors 23
Structures 28
Lists 34
Factors and Ordered Factors 37

Chapter 3 Data Frames 43


Introduction 44
The Benefits of Data Frames 45
Creating Data Frames 46
Combining Data Frames 52
Applying Functions to Subsets of a Data Frame 59
Adding New Classes of Variables to Data Frames 65
Data Frame Attributes 68

Chapter 4 Writing Functions in S-PLUS 71


Introduction 73
The Structure of Functions 75
Operating on Subsets of Data 94
Organizing Computations 107
Specifying Argument Lists 121
Error Handling 127
Input and Output 130
Wrap-Up Actions 158
Writing Special Functions 162
References 169

vii
CONTENTS

Chapter 5 Importing and Exporting 171


Supported File Types for Importing and Exporting 172
Importing Data 176
Exporting Data 192
Exporting Graphs 196
Creating HTML Output 202

Chapter 6 Debugging Your Functions 205


Introduction 206
Basic S-PLUS Debugging 207
Interactive Debugging 212
Other Debugging Tools 231

Chapter 7 Editable Graphics Commands 237


Introduction 239
Getting Started 241
Graphics Objects 244
Graphics Commands 249
Plot Types 258
Titles and Annotations 281
Formatting Axes 287
Formatting Text 289
Layouts for Multiple Plots 293
Specialized Graphs Using Your Own Computations 295

Chapter 8 Traditional Graphics 297


Introduction 300
Getting Started with Simple Plots 301
Frequently Used Plotting Options 305
Visualizing One-Dimensional Data 316
Visualizing the Distribution of Data 322
Visualizing Three-Dimensional Data 330
Visualizing Multidimensional Data 335
Interactively Adding Information to Your Plot 339
Customizing Your Graphics 345
Controlling Graphics Regions 351
Controlling Text and Symbols 355
Controlling Axes 361

viii
CONTENTS

Controlling Multiple Plots 367


Adding Special Symbols to Plots 375
Traditional Graphics Summary 380
References 384

Chapter 9 Traditional Trellis Graphics 385


A Roadmap of Trellis Graphics 387
Giving Data to Trellis Functions 390
General Display Functions 394
Arranging Several Graphs on One Page 416
Multipanel Conditioning 418
General Options for Multipanel Displays 435
Scales and Labels 439
Panel Functions 445
Panel Functions and the Trellis Settings 450
Superposing Two or More Groups of Values on a Panel 454
Aspect Ratio 464
Data Structures 469
Summary of Trellis Functions and Arguments 473

Chapter 10 Object-Oriented Programming in S-PLUS 479


Introduction 480
Fundamentals of Object-Oriented Programming 482
Defining New Classes in S-PLUS 486
Editing Methods 492
Group Methods 493
Extraction and Replacement Methods 499

Chapter 11 Programming the User Interface Using S-PLUS 501


The GUI Toolkit 503
General Object Manipulation 506
Information On Classes 517
Information on Properties 520
Object Dialogs 523
Selections 528
Options 532
Graphics Functions 533
Utilities 536
Summary of GUI Toolkit Functions 537

ix
CONTENTS

Chapter 12 Customized Analytics: A Detailed Example 539


Overview of the Case Study 540
The Basic Function 542
Enhancing the Function 544
Creating the Gaussfit Class 548
Creating A Constructor 549
Constructing Methods 551
Customized Graphical User Interface 560
Writing Help Files 573
Distributing Functions 576

Chapter 13 Automation 583


Introduction 584
Using S-PLUS as an Automation Server 585
Using S-PLUS as an Automation Client 610
Automation Examples 620

Chapter 14 Calling S-PLUS Using DDE 625


Introduction 626
Working With DDE 627

Chapter 15 Interfacing With C and Fortran Code 635


Overview 637
A Simple Example: Filtering Data 638
Using the C and Fortran Interfaces 641
Calling C or Fortran Routines From S-PLUS 643
Writing C and Fortran Routines Suitable for Use with S-PLUS 648
Compiling and Dynamically Linking Your Code 649
Common Concerns In Writing C and Fortran Code for Use with S-PLUS 656
Using C Functions Built into S-PLUS 670
Calling S-PLUS Functions From C Code 674
The .Call Interface 681
Debugging Loaded Code 686
A Note on StatLib 690

Chapter 16 Using CONNECT/C++ 691


Simple Examples: An Application and a Callable Routine 692
CONNECT/C++ Class Overview 698
CONNECT/C++ Architectural Features 701

x
CONTENTS

A Simple S-PLUS Interface 712

Chapter 17 Extending the User Interface 717


Overview 719
Menus 721
Toolbars and Palettes 730
Dialogs 742
Dialog Controls 756
Callback Functions 791
Class Information 797
Style Guidelines 805

Chapter 18 The S-PLUS Command Line and the System Interface 833
Using the Command Line 834
Command Line Parsing 837
Working With Projects 852
Enhancing S-PLUS 854
The System Interface 856

Chapter 19 Computing on the Language 863


Introduction 864
Symbolic Computations 866
Making Labels From Your Expressions 868
Creating File Names and Object Names 871
Building Expressions and Function Calls 872
Argument Matching and Recovering Actual Arguments 881

Chapter 20 Data Management 883


Introduction 884
Frames, Names and Values 885
Databases in S-PLUS 892
Matching Names and Values 903
Commitment of Assignments 904

Chapter 21 Using Less Time and Memory 907


Introduction 908
Time and Memory 909
Writing Good Code 917
Improving Speed 926

xi
CONTENTS

Chapter 22 Simulations in S-PLUS 929


Introduction 930
Working with Many Data Sets 931
Working with Many Iterations 932
Monitoring Progress 937
Example: A Simple Bootstrap Function 939
Summary of Programming Tips 941

Chapter 23 Evaluation of Expressions 943


Introduction 944
S-PLUS Syntax and Grammar 945

Chapter 24 The Validation Suite 957


Introduction 958
Outline of the Validation Routines 959
Running the Tests 963
Creating Your Own Tests 966

Index 969

xii
THE S-PLUS LANGUAGE

Introduction to S-PLUS
1 2
Interpreted vs. Compiled Languages 3
Object-Oriented Programming 3
Versions of the S Language 4
Programming Tools in S-PLUS 5
Syntax of S-PLUS Expressions 7
Names and Assignment 8
Subscripting 9
Data Classes 11
The S-PLUS Programming Environment 14
Editing Objects 14
Functions and Scripts 14
Transferring Data Objects 15
Graphics Paradigms 17
Editable Graphics 17
Traditional Graphics 17
Traditional Trellis Graphics 17
Converting Non-editable Graphics to Editable Graphics 17
When to Use Each Graphics System 18

1
Chapter 1 The S-PLUS Language

INTRODUCTION TO S-PLUS
S-PLUS is a language specially created for exploratory data analysis
and statistics. You can use S-PLUS productively and effectively without
even writing a one-line program in the S-PLUS language. However,
most users begin programming in S-PLUS almost subconsciously—
defining functions to streamline repetitive computations, avoid typing
mistakes in multi-line expressions, or simply to keep a record of a
sequence of commands for future use. The next step is usually
incorporating flow-of-control features to reduce repetition in these
simple functions. From there it is a relatively short step to the creation
of entirely new modules of S-PLUS functions, perhaps building on the
object-oriented features that allow you to define new classes of objects
and methods to handle them properly.
In this book, we concentrate on describing how to use the language.
As with any good book on programming, the goal of this book is to
help you quickly produce useful S-PLUS functions, and then step back
and delve more deeply into the internals of the S-PLUS language.
Along the way, we will continually touch on those aspects of S-PLUS
programming that are either particularly effective (such as vectorized
arithmetic) or particularly troubling (memory use, for loops).
This chapter aims to familiarize you with the language, starting with a
comparison of interpreted and compiled languages. We then briefly
describe object-oriented programming as it relates to S-PLUS,
although a full discussion is deferred until Chapter 10, Object-
Oriented Programming in S-PLUS. We then describe the basic syntax
and data types in S-PLUS. Programming in S-PLUS does not require,
but greatly benefits from, programming tools such as text editors and
source control. We touch on these tools briefly in the section The
S-PLUS Programming Environment (page 14). Finally, we introduce
the various graphics paradigms, and discuss when each should be
used.

Note

This book is intended for use with the S-PLUS Professional Edition. The full functionality of the
S-PLUS language, described in these pages, is not available to Axum or S-PLUS Standard Edition
users.

2
Introduction to S-PLUS

Interpreted vs. Like Java, S-PLUS is an interpreted language, in which individual


Compiled language expressions are read and then immediately executed. The
S-PLUS interpreter, which carries out the actions specified by the
Languages S-PLUS expressions, is always interposed between your S-PLUS
functions and the machine on which those functions are running.
C and Fortran, by contrast, are compiled languages, in which complete
programs in the language are translated by a compiler into the
appropriate machine language. Once a program is compiled, it runs
independently of the compiler. Interpreted programs, however, are
useless without their associated interpreter. Thus, anyone who wants
to use your S-PLUS programs needs to have a compatible version of
S-PLUS.
The great advantage of interpreted languages is that they allow
incremental development. You can write a function, run it, write another
function, run that, then write a third function that calls the previous
two. Incremental development is part of what makes S-PLUS an
excellent prototyping tool. You can create an empty shell of a function,
add features as desired, and relatively quickly create a working
version of virtually any application. You can then evaluate your
prototype to see if portions of the application might be more
efficiently coded in C or Fortran, and if so, easily incorporate that
compiled code into your finished S-PLUS application.
The disadvantage of interpreted languages is the overhead of the
interpreter. Compiled code runs faster and requires less memory than
interpreted code, in part because the compiler can look at the entire
program and optimize the machine code to perform the required
steps in the most efficient manner. Because there is no need for an
interpreter, more computer resources can be devoted to the compiled
program.

Object- Traditional computer programming, as the very name implies, deals


Oriented with programs, which are sequences of instructions that tell the
computer what to do. In the sense that a computer language is a
Programming language, programs (in S-PLUS, functions) are verbs.
Object-oriented programming, by contrast, deals largely with nouns,
namely, the data objects that traditional programs manipulate. In
object-oriented programming, you start thinking about a type of
object and try to imagine all the actions you might want to perform

3
Chapter 1 The S-PLUS Language

on objects of that type. You then define the actions specifically for that
type of object. Typically, the first such action is to create instances of
the type.
Suppose, for example, that you start thinking about some graphical
objects, more specifically, circles on the computer screen. You want to
be able to create circles, but you also want to be able to draw them,
redraw them, move them, and so on.
Using the object-oriented approach to programming, you would
define a class of objects called circle, then define a function for
generating circles. (Such functions are called generator functions.) What
about drawing, redrawing, and moving? All of these are actions that
may be performed on a wide variety of objects, but may well need to
be implemented differently for each. An object-oriented approach,
therefore, defines the actions generically, with generic functions called
draw, redraw, move, and so on.

The actual implementation of the action for a specific class is called a


method. For example, for our class circle we might define class-
specific methods for the functions draw, redraw, and move. S-PLUS
includes a mechanism for determining whether a function is generic,
and if so, determining the class of its arguments and calling the
appropriate method, so that, for example, if draw is generic and orb is
an object of class circle, the call draw(orb) would automatically call
the draw method for class circle, and draw orb.
We will take up object-oriented programming in detail in Chapter 10,
Object-Oriented Programming in S-PLUS.

Versions of the There are currently two distinct versions of the S language in
S Language common use: the S Version 3 language that underlies S-PLUS 2000 for
Windows (and all earlier versions of S-PLUS for Windows, as well as
UNIX versions of S-PLUS from 3.0 to 3.4) and the S Version 4
language that underlies S-PLUS 5.0 and later on UNIX and S-PLUS 6
for Windows and later.
The S Version 3 language (referred to in this document as SV3)
introduced the modeling language that is the foundation for most
S-PLUS statistical and analytic functionality. It had a simple object-
oriented structure with a dispatch mechanism built on naming
conventions. It did not apply any class structure to existing S-PLUS
objects such as vectors and matrices.

4
Introduction to S-PLUS

The S Version 4 language (referred to in this document as SV4)


introduced a strongly-typed object-oriented structure similar to that in
C++; in SV4, all objects are assigned classes, the dispatch mechanism
for methods is much more sophisticated, and there are far stricter
controls over inheritance. In particular, multiple inheritance is no
longer supported. If you are new to S-PLUS, you will be using SV4
from the start and you will find the instructions in this manual
accurate. If you have been working with S-PLUS a while, you may
find that certain S expressions you may have used in the past yield
different answers under SV4. We have tried to cover most of the
serious differences in output between SV3 and SV4 in the Release
Notes and in the appendix on migrating your code to S-PLUS 6.0 in
the User’s Guide.

Programming There are two main tools for developing S-PLUS programs: the
Tools in S-PLUS Commands window and Script windows. The Commands window
will be familiar to all users of S-PLUS prior to version 4. Only one
Commands window can be open, and the easiest way to do this is
simply click on its Standard toolbar button.

Figure 1.1: The Commands window button, found on the Standard toolbar.

The > prompt in the Commands window indicates S-PLUS is ready


for your input. You can now type expressions for S-PLUS to interpret.
Throughout this book, we show typed commands preceded by the
S-PLUS prompt, as in the following example, because this
representation matches what you see in the Commands window:

> plot(corn.rain)

If you type in examples from the text, or cut and paste examples from
the on-line manuals, be sure to omit the prompt character. To exit the
Commands window, simply use the close window tool on the top
right of the window. The command

> q()

will close down S-PLUS altogether.


The fix function is available from the Commands window to let you
edit a function within a text editor.

5
Chapter 1 The S-PLUS Language

Script windows, on the other hand, do not execute each statement as


it is typed in, nor is there a prompt character. They are for developing
longer S-PLUS programs, and for building programs from a variety of
sources, such as the History log.
For your first sessions programming in S-PLUS, we recommend you
use the Commands window.

6
Syntax of S-PLUS Expressions

SYNTAX OF S-PLUS EXPRESSIONS


You interact with S-PLUS by typing expressions, which the S-PLUS
interpreter evaluates and executes. S-PLUS recognizes a wide variety
of expressions, but in interactive use the most common are names,
which return the current definition of the named data object, and
function calls, which carry out a specified computation. Typing the
name of a built-in S-PLUS function, for example, shows the current
definition of the function:

> sqrt
function(x)
.Call("S_c_use_method", "sqrt")

A name is any combination of letters, numerals, and periods ( .) that


does not begin with a numeral.

Note

This definition applies to syntactic names, that is, names recognized by the S-PLUS interpreter as
names. S-PLUS provides a mechanism by which virtually any character string, including non-
syntactic names, can be supplied as the name of the data object. This mechanism is described in
Chapter 20, Data Management.

S-PLUS is case sensitive, so that x and X are different names. A


function call is usually typed as a function name followed by an
argument list (which may be empty) enclosed in parentheses:

> plot(corn.rain)
> mean(corn.rain)
[1] 10.78421

All S-PLUS expressions return a value, which may be NULL. Normally,


this return value is automatically printed. Some functions, however,
such as graphsheet, plot, and q are called primarily for their side
effects, such as starting or closing a graphics device, plotting points, or
ending an S-PLUS session. Such functions frequently have the
automatic printing of their values suppressed.

7
Chapter 1 The S-PLUS Language

If you type an incomplete expression (for example, by omitting the


closing parenthesis in a function call), S-PLUS provides a continuation
prompt (+, by default) to indicate that more input is required to
complete the expression.
Infix operators are functions with two arguments that have the special
calling syntax arg1 op arg2. For example, consider the familiar
mathematical operators:

> 2 + 7
[1] 9
> 12.4 / 3
[1] 4.133333

Names and One of the most frequently used infix operators is the assignment
Assignment operator <- (and its equivalents, the equal sign, =, and the
underscore, _) used to associate names and values. For example, the
expression

> aba <- 7

associates the value 7 with the name aba. The value of an assignment
expression is the assigned value, that is, the value on the right side of
the assignment arrow. Assignment suppresses automatic printing, but
you can use the print function to force S-PLUS to print the
expression’s value as follows:

> print(aba <- 7)


[1] 7

If we now type the name aba, we see the stored value:

> aba
[1] 7

The value on the right of the assignment arrow can be any S-PLUS
expression; the left side can be any syntactic name or character string.
1
There are a few reserved names, such as if and function.
Assignments typed at the S-PLUS prompt are permanent; objects
created in this way endure from session to session, until removed.

1. The complete list is as follows: if, is, else, for, while, repeat,
next, break, in, function, return, TRUE, T, FALSE, F, NULL, NA,
Inf, NaN.

8
Syntax of S-PLUS Expressions

Assignments within functions, however, are local to the function; they


endure only as long as the call to the function in which they occur. For
a complete discussion of assignment, see Chapter 20, Data
Management.
Object names must begin with a letter or period, and may include any
combination of upper and lower case letters, numbers and periods
("."). For example, mydata, my.data and my.data.1 are all legal
names. Note the use of the period to enhance readability. With
S-PLUS 5.x and later, another naming convention arose, where words
previously separated by periods were simply concatenated, with the
second and subsequent words having initial caps, as in the following:
exportData, getCurrDirectory, findClassObjects.

Subscripting Another common operator is the subscript operator [, used to extract


subsets of an S-PLUS data object. The syntax for subscripting is
object [subscript ]
Here object can be any S-PLUS object and subscript typically takes one
of the following forms:
• Positive integers corresponding to the position in the data
object of the desired subset. For example, the letters data set
consists of the 26 lowercase letters. We can pick the third
letter using a positive integer subscript as follows:

> letters[3]
[1] "c"

• Negative integers corresponding to the position in the data


object of points to be excluded:

> letters[-3]
[1] "a" "b" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n"
[14] "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "y" "z"

• Logical values; true values correspond to the points in the


desired subset, false values correspond to excluded points:

9
Chapter 1 The S-PLUS Language

> i <- 1:26


> i
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
[18] 18 19 20 21 22 23 24 25 26
> i < 13
[1] T T T T T T T T T T T T F F F F F F F F F F F F F F
> letters[ i < 13]
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l"

Subscripting is extremely important in making efficient use of S-PLUS


because it emphasizes treating data objects as whole entities, rather
than as collections of individual observations. This point of view is
central to S-PLUS’s utility as a data analysis computing environment.
For a full discussion of subscripting, see the Operating on Subsets of
Data on page 94.

10
Data Classes

DATA CLASSES
Everything in S-PLUS is an object. All objects have a class. An S-PLUS
expression (itself an object) is interpreted by the S-PLUS evaluator and
returns a value, another object that can be assigned a name. An
object’s class determines the representation of the object, that is, what
types of information can be found within the object, and where that
information can be found. Most information about an object is
contained within specialized structures called slots.
The simplest data objects are one-dimensional arrays called vectors,
consisting of any number of elements corresponding to individual data
points. The simplest elements are literal expressions that, either singly
or matched like-with-like, produce the following classes:
• logical: The values T (or TRUE) and F (or FALSE).
• integer: Integer values such as 3 or -4.
• numeric: Floating-point real numbers (double-precision by
default). Numerical values can be written as whole numbers
(for example, 3., -4.), decimal fractions (4.52, -6.003), or in
scientific notation (6.02e23, 8e-47).
• complex: Complex numbers of the form a + bi, where a and
b are integers or numeric (for example, 3 + 1.23i).

• character: character strings enclosed by matching double


quotes (") or apostrophes ( ’), for example, "Alabama",
’idea’.

These simple element classes are listed in order from least


informative (logical) to most informative (character); this order is
important when considering data objects formed by combining
elements from different classes.
The number of elements in a data object determines the object’s
length. A vector of length 1 can be created simply by typing a literal
and pressing ENTER:

> 7.4
[1] 7.4
> "hello"
[1] "hello"

11
Chapter 1 The S-PLUS Language

To combine multiple elements into a vector, use the c function:

> c(T,F,T)
[1] T F T
> c(8.3, 9.2, 11)
[1] 8.3 9.2 11.0

If you try to combine elements of different classes into a single vector,


2
S-PLUS coerces all the elements to the most informative class :

> c(T, 8.3, 5)


[1] 1.0 8.3 5.0
> c(8.3, 9 + 6i)
[1] 8.3+0i 9.0+6i
> c(T, 8.3, "hello")
[1] "TRUE" "8.3" "hello"

You can obtain the class and length of any data object using the class
and length functions, respectively:

> class(c(T, 8.3, 5.0))


[1] "numeric"
> length(c(T, 8.3, "hello"))
[1] 3

The most generally useful of the recursive data types is the list
function, which can be used to combine arbitrary collections of
S-PLUS data objects into a single object. For example, suppose you
have a vector x of character data, a matrix y of logical data, and a
time series z as shown below:

> x <- c("Tom", "Dick", "Harry")


> y <- matrix(c(T, F, T, F), ncol=2)
> z <- ts(sin(1:36), start=1989)

You can combine these into a single S-PLUS data object (of class
"list") using the list function:

2. This statement about coercion applies strictly only to the five simple
classes described on page 11. These simple classes correspond
roughly to what S version 3 and earlier referred to as modes.
(Although objects of class "integer" have mode "numeric".) The
concept of modes persists in S version 4, but it has been almost
entirely superseded by the new class mechanism.

12
Data Classes

> mylist <- list(x=x, y=y, z=z)


> mylist
$x:
[1] "Tom" "Dick" "Harry"

$y:
[,1] [,2]
[1,] T T
[2,] F F

$z:
1989: 0.841470985 0.909297427 0.141120008 -0.756802495
1993: -0.958924275 -0.279415498 0.656986599 0.989358247
1997: 0.412118485 -0.544021111 -0.999990207 -0.536572918
2001: 0.420167037 0.990607356 0.650287840 -0.287903317
2005: -0.961397492 -0.750987247 0.149877210 0.912945251
2009: 0.836655639 -0.008851309 -0.846220404 -0.905578362
2013: -0.132351750 0.762558450 0.956375928 0.270905788
2017: -0.663633884 -0.988031624 -0.404037645 0.551426681
2021: 0.999911860 0.529082686 -0.428182669 -0.991778853

The list class is an extremely powerful tool in S-PLUS, and we shall


use it extensively throughout this book.

13
Chapter 1 The S-PLUS Language

THE S-PLUS PROGRAMMING ENVIRONMENT


S-PLUS uses tools available in the Windows environment. Some of
these tools are built into S-PLUS as functions—for example, the edit
function, which allows you to edit with the Windows Notepad editor.
Windows software, including spreadsheets such as Microsoft Excel
and word processors such as Microsoft Word, can be called from
S-PLUS using the dos and system functions.
In this section, we give a brief introduction to the most common tools
for writing, editing, and testing your S-PLUS functions, as well as tools
for transferring data objects between computers with differing
architectures.

Editing Objects You can edit S-PLUS data by using the fix function.

> fix(x)

The fix function uses an editor you specify with the S-PLUS editor
option. At the S-PLUS prompt, type the following:

> options(editor="editor ")

where editor is the binary executable (.exe) that runs your favorite text
editor. To set this option for each S-PLUS session, add the expression
to your .First function. This option defaults to Notepad in S-PLUS.
Once you’ve set up S-PLUS to work with your favorite editor, writing
and testing new functions requires following the simple sequence of
writing the function, running the function, editing the function, and so
on.

Functions and Writing functions is the preferred way to incorporate new


Scripts functionality into S-PLUS. Functions allow you to combine a series of
S-PLUS expressions into a single executable call. Every function
returns a single value, which for functions built from multiple
expressions is the value of the last expression in the function’s body.
Sometimes, however, you may be interested in some or all of the
intermediate results of the combined expressions. You can (as we shall
see in the Data Output on page 130) pull the intermediate results
together into a return list. Sometimes, however, you may want those

14
The S-PLUS Programming Environment

intermediate results to be stored as individual data objects. In such


cases, it makes sense to program your task as an S-PLUS script, which
is just a text file containing valid S-PLUS expressions.
You can run S-PLUS scripts in any of the following ways:
1. The source function in the Commands window.
2. Loading it into a Script window, highlighting the required
code and clicking the Run button on the Script toolbar. See
Chapter 11, Using the Script and Report Windows, in the
User’s Guide.
3. The S-PLUS BATCH utility.
The methods differ primarily in that S-PLUS BATCH runs as a
background task and produces a file containing both the input and
the output of the job (you can suppress the input). This is frequently
useful if you have a complicated debugging task and need to recreate
the output of a number of expressions.

Transferring S-PLUS runs on a variety of hardware platforms with a variety of


Data Objects architectures. The binary representation of S-PLUS objects varies from
platform to platform, so if you want to share your functions or data
sets with users on other platforms, you need to first dump them to a
portable ASCII format with one of several S-PLUS functions, transfer
the ASCII file, then restore them using one of several S-PLUS
functions.
The functions for dumping and restoring are roughly paired: dump,
source, data.dump and data.restore. Objects dumped with dump
must be restored with source—the ASCII form produced by dump is
just an S-PLUS script, which you can read or edit just like any text file.
Objects dumped by data.dump result in files that are not S-PLUS
scripts; in fact, these files are in a special format that was not intended
to be read by humans. Such objects can be restored only by using the
data.restore function. The data.dump and data.restore functions
are much faster than the dump and source functions, and should
always be used when transferring large data sets, such as image data.
The dump function should be used when you want to transfer an
object, such as a function definition, that may need editing before
being restored.

15
Chapter 1 The S-PLUS Language

The functions data.dump and data.restore are used for importing


and exporting files with the S-PLUS transport file format (see Chapter
5, Importing and Exporting, for more details).

16
Graphics Paradigms

GRAPHICS PARADIGMS
In S-PLUS there are three basic graphics paradigms, which we will
refer to as Editable Graphics, Traditional Graphics, and Traditional
Trellis Graphics.

Editable Editable object-oriented graphics objects represent complete plots, or


Graphics elements added to plots such as lines, comments, and legends. The
plots generated from the plot palettes are each a single graph type
with sub-objects representing points, lines, axes, and more.
While most users will generate these graphs through menus and
toolbars, commands are also available to generate the plots
programmatically. This graphics system is new to S-PLUS version 4.
Chapter 7, Editable Graphics Commands, describes these graphics.

Traditional Traditional S-PLUS language functions are available to create plots


Graphics identical to those in previous versions of S-PLUS.
Chapter 8, Traditional Graphics, describes these graphics.

Traditional The Trellis graphics paradigm provides multipanel conditioning to


Trellis Graphics effectively discover relationships present in data. These graphics were
implemented using calls to the traditional graphics language
functions.
Chapter 9, Traditional Trellis Graphics, describes using conditioning
with the object-oriented plots.

Converting By default, traditional graphics commands produce a single


Non-editable composite graphics object which renders quickly. This object may be
annotated but its individual components are not available for editing.
Graphics to To edit individual components -- such as points and lines in the graph
Editable – first convert the graph to individual graphics objects by right-
Graphics clicking on the graph and selecting Convert to Objects from the
context menu.

17
Chapter 1 The S-PLUS Language

The conversion step may be avoided by creating editable graphics


objects directly. To turn on this editable graphics mode, press the
Editable Graphs button on the Commands window toolbar.
Alternately, you may open a Graph sheet device in editable graphics
mode using

> graphsheet(object.mode="object-oriented").

However, as editable graphics are slower to render than non-editable


graphics we strongly recommend creating non-editable graphics and
converting them to editable graphics, when needed, rather than using
object-oriented mode.
Traditional Trellis graphs are created by changing the axis system for
each panel, strip, and plot. This corresponds to a large number of plot
and graph objects in the editable graphics system. Due to the
complexity of the plots produced by traditional Trellis we strongly
recommend that non-editable graphics mode be used when
producing traditional Trellis plots.

When to Use The existence of multiple interconnected graphics systems is largely


Each Graphics due to the evolution of S-PLUS as graphics methodology and
technology has evolved. Here we describe the genesis of each system
System and the resulting benefits which derive therefrom.

Traditional The traditional graphics system is based on the pioneering work by


Graphics researchers at AT&T Bell Labs in graphical layout and perception. It
is optimized to provide smart default formatting and layout, while
providing programmatic specification of plot characteristics at a fine
level of control. These graphics have become the standard in
statistical publication-quality graphics due to their refined look and
ease of use.
As they pre-date modern object-oriented programming, they are
based on the rendering of low level graphics components such as
points and lines rather than on higher-level graphics objects. This
provides quicker rendering than editable graphics but does not yield
a high-level graphics object which may be accessed for editing. To
change a traditional graph the model is to regenerate the graph with
new specifications rather than to modify a graph object, although the
ability to convert to editable graphics does introduce the capability of
editing low level graph components.

18
Graphics Paradigms

Traditional graphics are produced by the techniques in the statistics


dialogs for speed of rendering and consistency with previous versions
of S-PLUS. It is likely that users will want to use traditional graphics
for similar reasons. Routines which use these graphics are
widespread, and their usage is well documented in both these
manuals and third party texts. Also, additional graphics methods are
available through traditional graphics which have not been
implemented as editable graphics.

Traditional Trellis Trellis graphics is a powerful technique for exploring multivariate


Graphics structure in data. They were implemented in traditional graphics for
convenience and to make them available to all S-PLUS users. This
implementation is described in Chapter 9, Traditional Trellis
Graphics.
Trellis conditioning has been incorporated directly into the editable
graphics system, making the power of multipanel conditioning
available in all editable graphs. Due to the complexity of Trellis plots,
the point-and-click graph property specification is a much more
convenient way to develop a Trellis graph.
Traditional Trellis graphics will be of interest to users wanting more
control over the contents of each panel than is available in the
editable graphics. Also, additional graphics methods are available
through traditional Trellis graphics which have not been implemented
as editable graphics.

Editable Graphics Editable graphics are new to S-PLUS version 4. They have been
developed based on modern C++ object-oriented programming
structures. As such they are based on a model of creating an object of
a particular class with properties containing a description of the
object. The user edits the object by modifying its properties. Multiple
graphics objects form an object hierarchy of plots within graphs
within Graph sheets which together represent a graphic.
Programmers used to using this type of object-oriented programming
will prefer to program by creating and modifying editable graphics
objects. Users of previous versions of S-PLUS may want to transition
towards using editable graphics when doing so provides benefits not
available with the traditional graphics, and continue to use traditional
graphics when they can leverage their existing experience to get
superior results.

19
Chapter 1 The S-PLUS Language

These graphics, introduced in S-PLUS 4 for Windows, are not


available to users running S-PLUS for UNIX. If you will be making
your functions available to users on both Windows and UNIX
platforms you will need to use traditional graphics and traditional
Trellis graphics, rather than editable graphics.

20
DATA OBJECTS

Introduction
2
22
Vectors 23
Coercion of Values 23
Creating Vectors 24
Naming Vector Elements 26
Structures 28
Matrices 28
Arrays 31
Lists 34
Creating Lists 35
Naming Components 36
Factors and Ordered Factors 37
Creating Factors 38
Creating Ordered Factors 40
Creating Factors From Continuous Data 41

21
Chapter 2 Data Objects

INTRODUCTION
When using S-PLUS, you should think of your data sets as data objects
belonging to a certain class. Each class has a particular representation,
often defined as a named list of slots. Each slot, in turn, contains an
object of some other class.
The class of an object defines how the object is represented and
determines what actions may be performed on the object and how
those actions are performed. Among the most common classes of data
objects are numeric, character, factor, list, and data.frame.
The simplest type of data object in S-PLUS is the atomic vector, a one-
way array of n elements of a single mode (for example, numbers) that
can be indexed numerically. Atomic vectors are so called to indicate
that in S-PLUS they are indeed fundamental objects. All of S-PLUS’s
basic mathematical operations and data manipulation functions are
designed to work on the vector as a whole, although individual
elements of the vector can be extracted using their numerical indices.
More complicated data objects can be constructed from atomic
vectors in one of two basic ways:
1. By allowing complete S objects as elements, or
2. By building new data classes from old using slots
Objects that contain other S objects as elements are called recursive
objects and include such common S-PLUS objects as lists and data
frames. A list is a vector for which each element is a distinct S object,
of any type. A data frame is essentially a list in which each of the
elements is an atomic vector, and all of the elements have the same
length. With slots, you can uniquely define a new class of data object
by storing the defining information (that is, the object’s attributes) in
one or more slots.
Data objects can contain not only logical, numeric, complex, and
character values, but also functions, operators, function calls, and
evaluations. All the different types (classes) of S-PLUS objects can be
manipulated in the same way: saved, assigned, edited, combined, or
passed as arguments to functions. This general definition of data
objects, coupled with class-specific methods, forms the backbone of
object-oriented programming and provides exceptional flexibility in
extending the capabilities of S-PLUS.

22
Vectors

VECTORS
The simplest type of data object in S-PLUS is a vector. A vector is
simply an ordered set of values. The order of the values is emphasized
because ordering provides a convenient way of extracting the parts of
a vector. To extract individual elements, use their numerical indices
with the subscript operator [:

> car.gals[c(1,3,5)]
[1] 13.3 11.5 14.3

All elements within an atomic vector must be from only one of seven
atomic modes—logical, numeric, single, integer, complex, raw, or
character. (An eighth atomic mode, NULL, applies only to the NULL
vector.) The number of elements and their mode completely define
the data object as a vector. The class of any vector is the mode of its
elements:

> class(c(T,T,F,T))
[1] "logical"
> class(c(1,2,3,4))
[1] "integer"
> class(c(1.24,3.45, pi))
[1] "numeric"

The number of elements in a vector is called the length of the vector


and can be obtained for any vector using the length function:

> length(1:10)
[1] 10

Coercion of When values of different modes are combined into a single atomic
Values object, S-PLUS converts, or coerces, all values to a single mode in a way
that preserves as much information as possible. The basic modes can
be arranged in order of increasing information—logical, integer,
numeric, complex, and character. Thus, mixed values are all
converted to the mode of the value with the most informative mode.
For example, suppose we combine a logical value, a numeric value,
and a character value, as follows:

> c(T, 2, "seven")


[1] "TRUE" "2" "seven"

23
Chapter 2 Data Objects

S-PLUS coerces all three values to mode character because this is the
most informative mode represented. Similarly, in the following
example, all the values are coerced to mode numeric:

> c(T, F, pi, 7)


[1] 1.000000 0.000000 3.141593 7.000000

When logical values are coerced to integers, TRUE values become the
integer 1 and FALSE values become the integer 0.
The same kind of coercion occurs when values of different modes are
combined in computations. For example, logical values are coerced
to zeros and ones in integer or numeric computations.

Creating If you want to create a vector, you can do so in a number of ways.


Vectors You have seen that you can combine arbitrary values to create a
vector with the c function and type in data from the keyboard or a
data file with the scan function.
Other functions are useful for repeating values or generating
sequences of numeric values. The rep function repeats a value by
specifying either a times argument or a length argument. If times is
specified, the value is repeated the number of times specified (the
value may be a vector):

> rep(NA,5)
[1] NA NA NA NA NA
> rep(c(T,T,F),2)
[1] T T F T T F

If times is a vector with the same length as the vector of values being
repeated, each value is repeated the corresponding number of times.

> rep(c("yes","no"),c(4,2))
[1] "yes" "yes" "yes" "yes" "no" "no"

The sequence operator generates sequences of integer values spaced


one unit apart.

> 1:5
[1] 1 2 3 4 5
> 1.2:4
[1] 1.2 2.2 3.2
> 1:-1
[1] 1 0 -1

24
Vectors

More generally, the seq function generates sequences of numeric


values with an arbitrary increment. For example:

> seq(-pi,pi,.5)
[1] -3.1415927 -2.6415927 -2.1415927 -1.6415927 -1.1415927
[6] -0.6415927 -0.1415927 0.3584073 0.8584073 1.3584073
[11] 1.8584073 2.3584073 2.8584073

You can specify the length of the vector and seq computes the
increment:

> seq(-pi,pi,length=10)
[1] -3.1415927 -2.4434610 -1.7453293 -1.0471976 -0.3490659
[6] 0.3490659 1.0471976 1.7453293 2.4434610 3.1415927

Or, you can specify the beginning, the increment, and the length with
either the length argument or the along argument:

> seq(1,by=.05,length=10)
[1] 1.00 1.05 1.10 1.15 1.20 1.25 1.30 1.35 1.40 1.45
> seq(1,by=.05,along=1:5)
[1] 1.00 1.05 1.10 1.15 1.20

See the help file for seq for more information on the length and
along arguments.

To “initialize” a vector of a certain mode and length before you know


the actual values, use the vector function. This function takes two
arguments: the first specifies the mode and the second specifies the
length:

> vector("logical",3)
[1] F F F

The functions logical, integer, numeric, complex, and character


generate vectors of the named mode. Each of these functions takes a
single argument that specifies the length of the vector. Thus,
logical(3) generates the same initialized vector as above.

25
Chapter 2 Data Objects

Table 2.1: Useful functions for creating vectors .

Function Description Examples

scan Reads values, any mode scan(), scan("data")

c Combines values, any mode c(1,3,2,6), c("yes","no")

rep Repeats values, any mode rep(NA,5), rep(c(1,2),3)

: numeric sequences 1:5, 1:-1

seq numeric sequences seq(-pi,pi,.5)

vector Initializes vectors vector('complex',5)

logical Initializes logical vectors logical(3)

integer Initializes integer vectors integer(4)

numeric Initializes numeric vectors numeric(5)

complex Initializes complex vectors complex(6)

character Initializes character vectors character(7)

Naming Vector You can assign names to vector elements to associate specific
Elements information, such as case labels or value identifiers, with each value of
the vector. To create a vector with named values, you assign the
names with the names function:

> numbered.letters <- letters


> names(numbered.letters) <- paste("obs",1:26,sep="")
> numbered.letters
obs1 obs2 obs3 obs4 obs5 obs6 obs7 obs8 obs9 obs10 obs11
"a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k"
obs12 obs13 obs14 obs15 obs16 obs17 obs18 obs19 obs20 obs21
"l" "m" "n" "o" "p" "q" "r" "s" "t" "u"

26
Vectors

obs22 obs23 obs24 obs25 obs26


"v" "w" "x" "y" "z"

In the above example, the first 26 integers are converted to character


strings by the paste function and then attached to each value. The
quotes around the numbers are suppressed in the printing. The actual
values of the vector numbered.letters are character strings, each
containing one letter.
If you specify too many or too few names for the values, S-PLUS gives
an error message.

27
Chapter 2 Data Objects

STRUCTURES
Next in complexity after the atomic vectors are the structures, which,
as the name implies, extend vectors by imposing a structure, typically
a multi-dimensional array, upon the data.
The simplest structure is the two-dimensional matrix. A matrix starts
with a vector and then adds the information about how many rows
and columns the matrix contains. This information, the dimension, or
dim, of the matrix, is stored in a slot in the representation of the
matrix class. All structure classes have at least one slot, .Data, which
must contain a vector. The classes matrix and array have one
additional required slot, .Dim, to hold the dimension and one optional
slot, .Dimnames, to hold the names for the rows and columns of a
matrix and their analogues for higher dimensional arrays. Like simple
vectors, structure objects are atomic, that is, all of their values must be
of a single mode.

Matrices Matrices are used to arrange values by rows and columns in a


rectangular table. For data analysis, different variables are usually
represented by different columns, and different cases or subjects are
represented by different rows. Thus, matrices are convenient for
grouping together observations that have been measured on the same
set of subjects and variables.
Matrices differ from vectors by having a .Dim slot, which specifies the
dimension of the matrix, that is, the number of rows and columns. Any
vector can be turned into a matrix simply by specifying its .Dim slot,
as we see in the examples below.

Creating Matrices To create a matrix from an existing vector, use the function to set
dim
the .Dim slot. To use dim, you assign a vector of two integers
specifying the number of rows and columns. For example:

> mat <- rep(1:4,rep(3,4))


> mat
[1] 1 1 1 2 2 2 3 3 3 4 4 4
> dim(mat) <- c(3,4)
> mat
[,1][,2][,3][,4]
[1,] 1 2 3 4

28
Structures

[2,] 1 2 3 4
[3,] 1 2 3 4

More often, you need to combine several vectors or matrices into a


single matrix. To combine vectors (and matrices) into matrices, use
the functions cbind and rbind. The cbind function combines vectors
column by column, and rbind combines vectors row by row.You can
easily combine counts for a 2×3 contingency table using rbind:

> rbind(c(200688,24,33),c(201083,27,115))
[,1][,2][,3]
[1,] 200688 24 33
[2,] 201083 27 115

Use the cbind function similarly for columns. When vectors of


different lengths are combined using cbind or rbind, the shorter ones
are replicated cyclically so that the matrix is “filled in.” If matrices are
combined, they must have matching numbers of rows when using
cbind and matching numbers of columns when using rbind.
Otherwise, S-PLUS prints an error message and the objects are not
combined.
Use the function matrix to convert objects to matrices. Combine the
values into a single vector using c and then group them by specifying
the number of columns or rows. To create a matrix from two vectors,
grp and thw, use matrix as follows:

> heart <- matrix(c(grp,thw),ncol=2)

If you provide fewer values as arguments to matrix than are required


to complete the matrix, the values are replicated cyclically until the
matrix is filled in. If you provide more data than necessary to
complete the matrix, excess values are discarded.
If either of ncol or nrow is provided but not both, the missing argument
is computed using the following relations:
• nrow = The smallest integer equal to or greater than the
number of values divided by the number of columns
• ncol = The smallest integer equal to or greater than the
number of values divided by the number of rows
Thus, nrow and ncol are computed to create the smallest matrix from
all the values when ncol or nrow is given individually.

29
Chapter 2 Data Objects

By default, the values are placed in the matrix column by column.


That is, all the rows of the first column are filled, then the rows of the
second column are filled, etc. To fill the matrix row by row, set the
byrow argument to T. For example:

> matrix(1:12,ncol=3,byrow=T)
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
[4,] 10 11 12

The byrow argument is especially useful when reading in data from a


text file that is arranged in a table. The data are read in (with scan)
row by row in this case, so the byrow argument is used to place the
values in a matrix correctly.

Naming Rows and For a vector you saw that you could assign names to each value with
Columns the names function. For matrices, you can assign names to the rows
and columns with the dimnames function. To create a matrix with row
and column names of your own, create a list with two components,
one for rows and one for columns, and assign them using the
dimnames function.

> dimnames(mat) <- list(paste("row",letters[1:3]),


+ paste("col",LETTERS[1:4]))
> mat
col A col B col C col D
row a 1 2 3 4
row b 1 2 3 4
row c 1 2 3 4

In the example above, letters and LETTERS are character vectors


with values the letters of the alphabet in lowercase and uppercase,
respectively. The character strings "row" and "col" are replicated to
match the length of vectors containing the letters for labeling. The
paste function binds values into a single character string.

To suppress either row or column labels, use the NULL value for the
corresponding component of the list. For example, to suppress the
row labels and number the columns:

> dimnames(mat) <- list(NULL, paste("col",1:4))


> mat

30
Structures

col 1 col 2 col 3 col 4


[1,] 1 2 3 4
[2,] 1 2 3 4
[3,] 1 2 3 4

To specify the row and column labels when defining a matrix with
matrix, use the optional argument dimnames as follows:

> mat2 <- matrix(1:12, ncol=4,


+ dimnames=list(NULL,paste("col",1:4)))

Arrays Arrays generalize matrices by extending the .Dim slot to more than
two dimensions. If the rows and columns of a matrix are the length
and width of a rectangular arrangement of equal-sized cubes, then
length, width, and height represent the dimensions of a three-way
array. You can visualize a series of equal-sized rectangles or cubes
stacked one on top of the other to form a three-dimensional box. The
box is composed of cells (the individual cubes) and each cell is
specified by its position along the length, width, and height of the
box.
An example of a three-dimensional array is the iris data set in
S-PLUS. The first two cases are presented here:

> iris[1:2,,]
, , Setosa
Sepal L. Sepal W. Petal L. Petal W.
[1,] 5.1 3.5 1.4 0.2
[2,] 4.9 3.0 1.4 0.2
, , Versicolor
Sepal L. Sepal W. Petal L. Petal W.
[1,] 7.0 3.2 4.7 1.4
[2,] 6.4 3.2 4.5 1.5
, , Virginica
Sepal L. Sepal W. Petal L. Petal W.
[1,] 6.3 3.3 6.0 2.5
[2,] 5.8 2.7 5.1 1.9

The data present 50 observations of sepal length and width and petal
length and width for each of three species of iris (Setosa, Versicolor,
and Virginica). The .Dim slot of iris represents the length, width, and
height in the box analogy:

31
Chapter 2 Data Objects

> dim(iris)
[1] 50 4 3

There is no limit to the number of dimensions of an array. Additional


dimensions are represented in the .Dim slot as additional values in the
vector; the number of values is the number of dimensions. From this,
we can think of a matrix as a two-dimensional array and a vector as a
one-dimensional array.

Creating Arrays To create an array in S-PLUS, use the array function. The array
function is analogous to matrix. It takes data and the appropriate
dimensions as arguments to produce the array. If no data are
supplied, the array is filled with NAs.
When passing values to array, combine them in a vector so that the
first dimension varies fastest, the second dimension the next fastest,
and so on. The following example shows how this works:

> array(c(1:8,11:18,111:118),dim=c(2,4,3))
, , 1
[,1][,2][,3][,4]
[1,] 1 3 5 7
[2,] 2 4 6 8
, , 2
[,1][,2][,3][,4]
[1,] 11 13 15 17
[2,] 12 14 16 18
, , 3
[,1][,2][,3][,4]
[1,] 111 113 115 117
[2,] 112 114 116 118

The first dimension (the rows) is incremented first. This is equivalent


to placing the values column by column. The second dimension (the
columns) is incremented second. The third dimension is incremented
by filling a matrix for each level of the third dimension.
For creating arrays from existing vectors, the dim function works for
arrays in the same way it works for matrices. The dim function lets
you set the .Dim slot as you can for a matrix. For example, if the data
above were stored in the vector vec, you could create the above array
by defining the .Dim slot with the vector c(2,4,3):

32
Structures

> vec
[1] 1 2 3 4 5 6 7 8 11 12 13
[12] 14 15 16 17 18 111 112 113 114 115 116
[23] 117 118
> dim(vec) <- c(2,4,3)

To name each level of each dimension, use the dimnames argument to


array. This passes a list of names in the same way as is done for
matrices. For more information on dimnames, see Naming Rows and
Columns on page 30.

33
Chapter 2 Data Objects

LISTS
A list is a completely flexible means for representing data. In earlier
versions of S, it was the standard means of combining arbitrary
objects into a single data object. Much the same effect can be created,
however, using the notion of slots.
Up to this point, all the data objects described have been atomic,
meaning they contain data of only one mode. Often, however, you
need to create objects that not only contain data of mixed modes but
also preserve the mode of each value.
For example, the slots of an array may contain both the dimension (a
numeric vector) and the .Dimnames slot (a character vector), and it is
important to preserve those modes:

> attributes(iris)
$dim:
[1] 50 4 3

$dimnames:
$dimnames[[1]]:
character(0)

$dimnames[[2]]:
[1] "Sepal L." "Sepal W." "Petal L." "Petal W."

$dimnames[[3]]:
[1] "Setosa" "Versicolor" "Virginica"

The value returned by attributes is a simple example of an S-PLUS


list. Lists are a very general data type. They are made up of
components, where each component consists of one data object of any
type, that is, from component to component, the mode and type of the
object can change.
For example, the attributes list for the iris data set consists of two
components, a dim component and a dimnames component. The dim
component, the value of the .Dim slot, is a numeric vector of length
three. The dimnames component, the value of the .Dimnames slot, is
another list with three components. The first component is an empty
character vector (character(0)), the second component is a vector

34
Lists

of four character strings indicating whether the measurement is sepal


length or width or petal length or width, and the third component is a
vector of three character strings specifying the species of iris.

Creating Lists To create a list, use the list function. Each argument to list defines
a component of the list. Naming an argument, using the form
name=component, creates a name for the corresponding component.

For example, you can create a list from the two vectors grp and thw as
follows:

> grp <- c(rep(1,11),rep(2,10))


> thw <- c(450,760,325,495,285,450,460,375,310,615,425,245,
+ 350,340,300,310,270,300,360,405,290)
> heart.list <- list(group=grp, thw=thw,
+ descrip="heart data")
> heart.list
$group:
[1] 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2

$thw:
[1] 450 760 325 495 285 450 460 375 310 615 425 245 350
[14] 340 300 310 270 300 360 405 290

$descrip:
[1] "heart data"

The first component of the list contains a numeric vector with


grouping information for the data, so it is named group. The second
component is the total heart weight (thw) in grams. The name of the
component is the same as the name of the object stored in that
component. The thw on the left of the equal sign is the component
name, and the thw on the right of the equal sign is the object stored
there. The third component contains a character vector, which briefly
describes the data.
To access a list component, specify the name of the list and the name
of the component, separated by a dollar sign ($).
For example, to display the grouping data:

> heart.list$group
[1] 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2

35
Chapter 2 Data Objects

More generally, you can access list components by an index number


enclosed in double brackets ([[]]). For example, the grouping
information can also be accessed by:

> heart.list[[1]]
[1] 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2

Once you’ve accessed a component, you can specify particular values


of the component in the usual way, using the single bracket ([])
notation. For example, since the group component is a vector, you
can obtain the 11th and 12th elements with:

> heart.list[[1]][11:12]
[1] 1 2

or

> heart.list$group[11:12]
[1] 1 2

If you define a list without naming the components, components can


be accessed only using the double bracket notation. When the
components are named, you can use either the double bracket
notation or the names convention with a $ separating the list name
and the component name.

Naming The names of a list’s components can be changed by assigning them


Components with the names function:

> names(heart.list) <- c("group","total heart weight",


+ "descrip")
> names(heart.list)
[1] "group" "total heart weight" "descrip"

36
Factors and Ordered Factors

FACTORS AND ORDERED FACTORS


In data analysis, many kinds of data are qualitative rather than
quantitative or numeric. If observations can be assigned only to a
category, rather than given a specific numeric value, they are termed
qualitative or categorical. The values assigned to these variables are
typically short character descriptions of the category to which the
observation belongs. The following lists some examples of categorical
variables:
• Gender, where the values are male and female.
• Marital status, where the values might be single, married,
separated, and divorced.
• Experimental status, where the values might be treatment and
control.

Categorical data in S-PLUS is represented with a data type called a


factor. The built-in data frame fuel.frame has a variable named
Type that classifies each automobile as one of Small, Sporty, Compact,
Medium, Large, or Van.

> fuel.frame$Type
[1] Small Small Small Small Small Small Small
[8] Small Small Small Small Small Small Sporty
[15] Sporty Sporty Sporty Sporty Sporty Sporty Sporty
[22] Sporty Compact Compact Compact Compact Compact Compact
[29] Compact Compact Compact Compact Compact Compact Compact
[36] Compact Compact Medium Medium Medium Medium Medium
[43] Medium Medium Medium Medium Medium Medium Medium
[50] Medium Large Large Large Van Van Van
[57] Van Van Van Van

When you print a factor, the values correspond to the level of the
factor for each data point or observation. Internally, a factor keeps
track of the levels or different categorical values contained in the data
and indices that point to the appropriate level for each data point.
The different levels of a factor are stored in an attribute called levels.
Factor objects are a natural form for categorical data in an object-
oriented programming environment because they have a class
attribute that allows specific method functions to be developed for

37
Chapter 2 Data Objects

them. For example, the generic print function uses the print.factor
method to print factors. If you override print.factor by calling
print.default, you can see how a factor is stored internally.

> print.default(fuel.frame$Type)
[1] 4 4 4 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 1 1 1
[26] 1 1 1 1 1 1 1 1 1 1 1 1 3 3 3 3 3 3 3 3 3 3 3 3 3
[51] 2 2 2 6 6 6 6 6 6 6
attr(, "levels"):
[1] "Compact" "Large" "Medium" "Small" "Sporty" "Van"
attr(, "class"):
[1] "factor"

The integers serve as indices to the values in the levels attribute. You
can return the integer indices directly with the codes function.

> codes(fuel.frame$Type)
[1] 4 4 4 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 1 1 1
[26] 1 1 1 1 1 1 1 1 1 1 1 1 3 3 3 3 3 3 3 3 3 3 3 3 3
[51] 2 2 2 6 6 6 6 6 6 6

Or, you can examine the levels of a factor with the levels function.

> levels(fuel.frame$Type)
[1] "Compact" "Large" "Medium" "Small" "Sporty" "Van"

The print.factor function is roughly equivalent to

> levels(fuel.frame$Type)[codes(fuel.frame$Type)]

except that the quotes are dropped. To get the number of cases of
each level in a factor, call summary:

> summary(fuel.frame$Type)
Compact Large Medium Small Sporty Van
15 3 13 13 9 7

Creating To create a factor, use the factor function. The factor function takes
Factors data with categorical values and creates a data object of class factor.
For example, you can categorize a group of 10 students by gender as
follows:

> classlist <- c("male", "female", "male", "male", "male",


+ "female", "female", "male", "female", "male")

38
Factors and Ordered Factors

> factor(classlist)
[1] male female male male male female female male
[9] female male

S-PLUS creates two levels with labels female and male, respectively.
Table 2.2: Arguments to factor.

Argument Description

x Data, to be thought of as taking values on the


finite set of levels.

levels Optional vector of levels for the factor. The


default value of levels is the sorted list of distinct
values of x.

labels Optional vector of values to use as labels for the


levels of the factor. The default is
as.character(levels).

excludes A vector of values to be excluded from forming


levels.

The levels argument allows you to specify the levels you want to use
or to order them the way you want. For example, if you want to
include certain categories in an analysis, you can specify them with
the levels argument. Any values omitted from the levels argument
are considered missing.

> intensity <- factor(c("Hi","Med","Lo","Hi","Hi","Lo"),


+ levels = c("Lo","Hi"))
> intensity
[1] Hi NA Lo Hi Hi Lo
> levels(intensity)
[1] "Lo" "Hi"

If you had left the levels argument off, the levels would have been
ordered alphabetically as Hi, Low, Medium. You use the labels
argument if you want the levels to be something other than the
original data.

39
Chapter 2 Data Objects

> factor(c("Hi","Lo","Med","Hi","Hi","Lo"),
+ levels=c("Lo","Hi"), labels = c("LowDose","HighDose"))
[1] HighDose LowDose NA HighDose HighDose LowDose

Warning

If you provide the levels and labels arguments, then you must order them in the same way. If
you don’t provide the levels argument but do provide the labels argument, then you must
order the labels the same way S-PLUS orders the levels of the factor, which is alphabetically for
character strings and numerically for a numeric vector that is converted to a factor.

Use the exclude argument to indicate which values to exclude from


the levels of the resulting factor. Any value that appears in both x and
exclude will be NA in the result and will not appear in the levels
attribute. The intensity factor could alternatively have been
produced with:

> factor(c("Hi","Med","Lo","Hi","Hi","Lo"),
+ exclude =c("Med"))
[1] Hi NA Lo Hi Hi Lo

Creating If the order of the levels of a factor is important, you can represent the
Ordered data as a special type of factor called an ordered factor. Use the ordered
function to create ordered factors. The arguments to ordered are the
Factors same as those to factor. To create an ordered version of the intensity
factor, do:

> ordered(c("Hi","Med","Lo","Hi","Hi","Lo"),
+ levels=c("Lo","Med","Hi"))
[1] Hi Med Lo Hi Hi Lo
Lo < Med < Hi

The order relationship between the different levels is printed for an


ordered factor along with the values. The order of the values used in
the levels argument determines the order placed on the levels.

Warning

If you don’t provide a levels argument, an ordering will be placed on the levels corresponding
to the default ordering of the levels by S-PLUS.

40
Factors and Ordered Factors

Creating To create categorical data out of numerical or continuous data, use the
Factors From cut function. You provide either a vector of specific breakpoints or an
integer specifying how many groups to divide the numerical data
Continuous into; cut then creates levels corresponding to the specified ranges. All
Data the values falling in any particular range are assigned the same level.
For example, the murder rates in the 50 states can be grouped into
High and Low values using cut:

> cut(state.x77[,"Murder"],breaks=c(0,8,16))
[1] 2 2 1 2 2 1 1 1 2 2 1 1 2 1 1 1 2 2 1 2 1 2 1 2 2
[26] 1 1 2 1 1 2 2 2 1 1 1 1 1 1 2 1 2 2 1 1 2 1 1 1 1
attr(, "levels"):
[1] " 0+ thru 8" "8+ thru 16"

The breakpoints must completely enclose the values you want


included in the factors. Data less than or equal to the first breakpoint or
greater than the last breakpoint are returned as NA.
To create a specific number of groups, by partitioning the range of the
data into equal-sized intervals, use an integer value for the breaks
argument:

> cut(state.x77[,"Murder"], breaks=2)


[1] 2 2 1 2 2 1 1 1 2 2 1 1 2 1 1 1 2 2 1 2 1 2 1 2 2
[26] 1 1 2 1 1 2 2 2 1 1 1 1 1 1 2 1 2 2 1 1 2 1 1 1 1
attr(, "levels"):
[1] "1.263+ thru 8.250" "8.250+ thru 15.237"

By default, cut creates labels of the


form first breakpoint thru
second breakpoint, etc., using either
the breakpoints you provide or
the ones it creates. However, you can assign different labels to the
levels with the labels argument.

> cut(state.x77[,"Murder"],c(0,8,16),
+ labels=c("Low","High"))
[1] 2 2 1 2 2 1 1 1 2 2 1 1 2 1 1 1 2 2 1 2 1 2 1 2 2
[26] 1 1 2 1 1 2 2 2 1 1 1 1 1 1 2 1 2 2 1 1 2 1 1 1 1
attr(, "levels"):
[1] "Low" "High"

41
Chapter 2 Data Objects

Note

As you may notice from the style of printing in the above examples, cut does not produce factors
directly. Rather, the value returned by cut is a category object.

To create a factor from the output of cut, just call factor with the call
to cut as its only argument:

> factor(cut(state.x77[,"Murder"], c(0,8,16),


+ labels=c("Low","High")))
[1] High High Low High High Low Low Low High High
[11] Low Low High Low Low Low High High Low High
[21] Low High Low High High Low Low High Low Low
[31] High High High Low Low Low Low Low Low High
[41] Low High High Low Low High Low Low Low Low

42
DATA FRAMES

Introduction
3
44
The Benefits of Data Frames 45
Creating Data Frames 46
Rectangular Data Functions 50
Combining Data Frames 52
Combining Data Frames by Column 52
Combining Data Frames by Row 54
Merging Data Frames 56
Converting Data Frames 58
Applying Functions to Subsets of a Data Frame 59
Adding New Classes of Variables to Data Frames 65
Data Frame Attributes 68

43
Chapter 3 Data Frames

INTRODUCTION
Data frames are data objects designed primarily for data analysis and
modeling. You can think of them as generalized matrices—generalized
in a way different from the way arrays generalize matrices. Arrays
generalize the dimensional aspect of a matrix; data frames generalize
the mode aspect of a matrix. Matrices can be of only one mode (for
example, "logical", "numeric", "complex", "character"). Data
frames, however, allow you to mix modes from column to column.
For example, you could have a column of "character" values, a
column of "numeric" values, a column of categorical values, and a
column of "logical" values. Each column of a data frame
corresponds to a particular variable; each row corresponds to a single
“case” or set of observations.

44
The Benefits of Data Frames

THE BENEFITS OF DATA FRAMES


The main benefit of a data frame is that it allows you to mix data of
different types into a single object in preparation for analysis and
modeling. The idea of a data frame is to group data by variables
(columns) regardless of their type. Then all the observations on a
particular set of variables can be grouped into a single data frame.
This is particularly useful in data analysis where it is typical to have a
"character" variable labeling each observation, one or more
"numeric" variables of observations, and one or more categorical
variables for grouping observations. An example is a built-in data set,
solder, with information on a welding experiment conducted by
AT&T at their Dallas factory.
> sampleruns <- sample(row.names(kyphosis),10)
> kyphosis[ sampleruns,]
Kyphosis Age Number Start
63 present 130 4 1
4 absent 2 5 1
39 absent 1 3 9
55 present 139 10 6
21 absent 27 4 9
46 absent 61 4 1
23 present 105 6 5
68 absent 17 4 10
18 absent 78 6 15
36 absent 112 3 16

A sample of 10 of the 83 observations is presented for all four


variables. The variable Kyphosis is the outcome which indicates
whether the post-operative deformity is present or abesent. The row
names on the left are the run numbers for the experiment. Combined
in kyphosis are character data (the row names), categorical data
(Kyphosis), and numeric data (Number and Start).

45
Chapter 3 Data Frames

CREATING DATA FRAMES


You can create data frames in several ways.
• read.table reads in data from an external file.
• data.frame binds together S-PLUS objects of various kinds.
• as.data.frame coerces objects of a particular type to objects
of class data.frame.
You can also combine existing data frames in several ways, using the
cbind, rbind,and merge functions.
The read.table function reads data stored in a text file in table
format directly into S-PLUS. It is discussed in detail in Data Input on
page 130. The as.data.frame function is primarily a support function
for the top-level data.frame function—it provides a mechanism for
defining how new variable classes should be included in newly-
constructed data frames. This mechanism is discussed further in the
section Adding New Classes of Variables to Data Frames (page 65).
For most purposes, when you want to create or modify data frames
within S-PLUS, you use the data.frame function or one of the
combining functions cbind, rbind or merge. This section focuses
specifically on the data.frame function for combining S-PLUS objects
into data frames. The following section discusses the functions for
combining existing data frames.
The data.frame function is used for creating data frames from
existing S-PLUS data objects rather than from data in an external text
file. The only required argument to data.frame is one or more data
objects. All of the objects must produce columns of the same length.
Vectors must have the same number of observations as the number of
rows of the data frame, matrices must have the same number of rows
as the data frame, and lists must have components that match in
lengths for vectors or rows for matrices. If the objects don’t match
appropriately, you get an error message saying the "arguments imply
differing number of rows". For example, suppose we have vectors
of various modes, each having length 20, along with a matrix with
two columns and 20 rows, and a data frame with 20 observations for
each of three variables. We can combine these into a data frame as
follows:

46
Creating Data Frames

> my.logical <- sample(c(T,F), size=20, replace=T)


> my.complex <- rnorm(20) + runif(20)*1i
> my.numeric <- rnorm(20)
> my.matrix <- matrix(rnorm(40), ncol=2)
> my.df <- kyphosis[1:20, 1:3]
> my.df2 <- data.frame(my.logical, my.complex, my.numeric,
+ my.matrix, my.df)
> my.df2
my.logical my.complex my.numeric my.matrix
1 F 0.6089225+0.9225925840i -3.10164384 0.88012586
2 F -2.0819531+0.0336728902i -0.55111325 0.27279513
3 T 0.8878321+0.9771889383i -0.72223763 0.84707218
4 F 0.7471270+0.5487224348i -0.27917610 2.00179088
5 T 1.1005395+0.0631634402i 0.15104893 -0.68347069
6 T 0.3485193+0.1848195572i -0.44838152 -0.47493315
7 F 1.6454204+0.6908840677i 0.44405148 1.18727220
8 F 1.4330907+0.0004898258i 0.04847902 -2.17772281
9 F -0.8531461+0.9480914157i 0.14967287 -2.25494105
10 T 0.8741626+0.1823104322i -1.39863545 -3.22639704
11 F -0.2090367+0.0066157957i -0.23842813 -0.36280708
12 F 1.1757802+0.6762467814i 0.32989672 -0.86669093
13 F -0.3004673+0.3170390362i -1.68374843 0.01818504
14 T -1.4100635+0.3889551479i -1.27312064 0.35701916
16 F 0.6641272+0.2191691762i -0.60716481 -0.40695505
17 T -0.1866811+0.8029941772i -1.01767418 -1.53522281
18 F 0.8642104+0.6114265160i -0.07657414 0.23754561
19 F -0.4507343+0.6050521806i -0.38748806 0.25455890
20 F -1.8629536+0.7581159561i -1.07376663 -0.16346027
21 T -1.0725881+0.5973116844i 1.91706202 0.42669240

my.matrix Kyphosis Age Number


1 -0.023943300 absent 71 3
2 -1.301475283 absent 158 3
3 -1.396698458 present 128 4
4 0.384949812 absent 2 5
5 -0.639857710 absent 1 4
6 1.134441750 absent 1 2
7 -1.902422316 absent 61 2
8 -0.058446250 absent 37 3
9 0.126896172 absent 113 2
10 0.795556284 present 59 6

47
Chapter 3 Data Frames

11 0.593684564 present 82 5
12 0.291224646 absent 148 3
13 -0.162832145 absent 18 5
14 0.248051730 absent 1 4
16 -0.957828145 absent 168 3
17 0.051553058 absent 1 3
18 -0.294367576 absent 78 6
19 -0.001231745 absent 175 5
20 -0.225155320 absent 80 5
21 -0.192293286 absent 27 4

The names of the objects are used for the variable names in the data
frame. Row names for the data frame are obtained from the first
object with a names, dimnames, or row.names attribute having unique
values. In the above example, the object was my.df:

> my.df
Kyphosis Age Number
1 absent 71 3
2 absent 158 3
3 present 128 4
4 absent 2 5
5 absent 1 4
6 absent 1 2
7 absent 61 2
8 absent 37 3
9 absent 113 2
10 present 59 6
11 present 82 5
12 absent 148 3
13 absent 18 5
14 absent 1 4
16 absent 168 3
17 absent 1 3
18 absent 78 6
19 absent 175 5
20 absent 80 5
21 absent 27 4

The row names are not just the row numbers—in our subset, the
number 15 is missing. The fifteenth row of kyphosis, and hence
my.df, has the row name "16".

48
Creating Data Frames

The attributes of special types of vectors (such as factors) are not lost
when they are combined in a data frame. They can be retrieved by
asking for the attributes of the particular variable of interest. More
detail is given in the section Data Frame Attributes (page 68).
Each vector adds one variable to the data frame. Matrices and data
frames provide as many variables to the new data frame as they have
columns or variables, respectively. Lists, because they can be built
from virtually any data object, are more complicated—they provide as
many variables as all of their components taken together.
When combining objects of different types into a data frame, some
objects may be altered somewhat to be more suitable for further
analysis. For example, numeric vectors and factors remain unchanged
in the data frame. Character and logical vectors, however, are
converted to factors before being included in the data frame. The
conversion is done because S-PLUS assumes that character and logical
data will most commonly be taken to be a categorical variable in any
modeling that is to follow. If you want to keep a character or logical
vector “as is” in the data frame, pass the vector to data.frame
wrapped in a call to the I function, which returns the vector
unchanged but with the added class "AsIs".
For example, consider the following logical vector, my.logical:

> my.logical
[1] T T T T T F T T F T T F T F T T T T T T

We can combine it as is with a numeric vector rnorm(20) in a data


frame as follows:

> my.df <- data.frame(a=rnorm(20), b=my.logical)


> my.df
a b
1 -0.6960192 T
2 0.4342069 T
3 0.4512564 T
4 -0.8785964 T
5 0.8857739 T
6 -0.2865727 F
7 -1.0415919 T
8 -2.2958470 T
9 0.7277701 F
10 -0.6382045 T

49
Chapter 3 Data Frames

11 -0.9127547 T
12 0.1771526 F
13 0.5361920 T
14 0.3633339 F
15 0.5164660 T
16 0.4362987 T
17 -1.2920592 T
18 0.8314435 T
19 -0.6188006 T
20 1.4910625 T

> mode(my.df$b)
[1] "logical"

You can provide a character vector as the row.names argument to


data.frame. Just make sure it is the same length as the data objects
you are combining into the data frame.

> data.frame(price,country,reliab,mileage,type,
+ row.names=c("Acura","Audi","BMW","Chev","Ford",
+ "Mazda","MazdaMX","Nissan","Olds","Toyota"))
price country reliab mileage type
Acura 11950 Japan 5 NA Small
Audi 26900 Germany NA NA Medium
. . .

Rectangular Rectangular data functions allow you to access all rectangular data
Data Functions objects in the same way. Rectangular data objects include matrices,
data frames, and atomic vectors which have the form of rows
(observations) and one or more columns (variables).
There are eight rectangular data functions you can use:
• as.rectangular converts any object to a rectangular data
object (generally a data frame).
• as.char.rect takes a rectangular object and returns a
rectangular object consisting of character strings, suitable for
printing (but not formatted to fixed width).
• is.rectangular tests whether an object is rectangular.
• sub is used for subscripting.

50
Creating Data Frames

• numRows and numCols count the number of rows and columns,


respectively.
• rowIds and colIds return the row and column names,
respectively.
numRows, numCols, rowIds, and colIds can also be used on the left
side of assignments. For more information on any of these functions,
type

help(function)

where function is one of the rectangular data functions listed above.

51
Chapter 3 Data Frames

COMBINING DATA FRAMES


We have already seen one way to combine data frames—since data
frames are legal inputs to the data.frame function, you can use
data.frame directly to combine one or more data frames. For certain
specific combinations, other functions may be more appropriate. This
section discusses three general cases:
1. Combining data frames by column. This case arises when you
have new variables to add to an existing data frame, or have
two or more data frames having observations of different
variables for identical subjects. The principal tool in this case
is the cbind function.
2. Combining data frames by row. This case arises when you
have multiple studies providing observations of the same
variables for different sets of subjects. For this task, use the
rbind function.

3. Merging (or joining) data frames. This case arises when you
have two data frames containing some information in
common, and you want to get as much information as
possible from both data frames about the overlapping cases.
For this case, use the merge function.
All three of the functions mentioned above ( cbind, rbind, and merge)
have methods for data frames, but in the usual cases, you can simply
call the generic function and obtain the correct result.

Combining Suppose you have a data frame consisting of factor variables defining
Data Frames an experimental design. When the experiment is complete, you can
add the vector of observed responses as another variable in the data
by Column frame. In this case, you are simply adding another column to the
existing data frame, and the natural tool for this in S-PLUS is the cbind
function. For example, consider the simple built-in design matrix
oa.4.2p3, representing a half-fraction of a 2^4 design.

> oa.4.2p3
A B C
1 A1 B1 C1
2 A1 B2 C2
3 A2 B1 C2

52
Combining Data Frames

4 A2 B2 C1

If we run an experiment with this design, we obtain a vector of length


four, one observation for each row of the design data frame. We can
combine the observations with the design using cbind as follows.

> run1 <- cbind(oa.4.2p3, resp=c(46, 34, 44, 30))


> run1
A B C resp
1 A1 B1 C1 46
2 A1 B2 C2 34
3 A2 B1 C2 44
4 A2 B2 C1 30

Another use of cbind is to bind a constant vector to a data frame, as in


the following example.

> fuel1 <- cbind(1, fuel.frame)


> fuel1
X1 Weight Disp. Mileage FuelType
Eagle Summit 4 1 2560 97 33 3.030303 Small
Ford Escort 4 1 2345 114 33 3.030303 Small
Ford Festiva 4 1 1845 81 37 2.702703 Small
Honda Civic 4 1 2260 91 32 3.125000 Small
Mazda Protege 4 1 2440 113 32 3.125000 Small
. . .

As a more substantial example, consider the built-in data sets


cu.summary, cu.specs, and cu.dimensions. Each of these data sets
contains observations about a number of car models, but the list of
car models is slightly different in each. All, however, contain data for
the cars listed in the data set common.names.

> common.names
[1] "Acura Integra" "Acura Legend"
[3] "Audi 100" "Audi 80"
[5] "BMW 325i" "BMW 535i"
[7] "Buick Century" "Buick Electra"
. . .

53
Chapter 3 Data Frames

The data sets match.summary, match.specs, and match.dims contain


the row subscripts to obtain observations about the models listed in
common.names from, respectively, cu.summary, cu.specs, and
cu.dimensions. We can use these data sets and the cbind function to
compile a general car information data set.

> car.mine <- cbind(cu.dimensions[match.dims,],


+ cu.specs[match.specs,], cu.summary[match.summary,],
+ row.names=common.names)

Compare car.mine to the built-in data set car.all, constructed in a


similar fashion.
You can get statistics on individual columns by running any of the
four following functions in S-PLUS:
• colMeans
• colSums
• colVars
• colStdevs

which returns the mean, sum, variance, and standard deviation,


respectively, for the specified column or columns.

Combining Suppose you are pooling the data from several research studies. You
Data Frames have data frames with observations of equivalent, or roughly
equivalent, variables for several sets of subjects. Renaming variables
by Row as necessary, you can subscript the data sets to obtain new data sets
having a common set of variables. You can then use rbind to obtain a
new data frame containing all the observations from the studies.
For example, consider the following data frames.

> rand.df1 <-


data.frame(norm=rnorm(20),unif=runif(20),binom=rbinom(20,10
,0.5))
> rand.df1
norm unif binom
1 1.64542042 0.45375156 41
2 1.64542042 0.83783769 44
3 -0.13593118 0.31408490 53
4 0.26271524 0.57312325 34
5 -0.01900051 0.25753044 47
6 0.14986005 0.35389326 41

54
Combining Data Frames

7 0.07429523 0.53649764 43
8 -0.80310861 0.06334192 38
9 0.47110022 0.24843933 44
10 -1.70465453 0.78770638 45
> rand.df2 <-
data.frame(norm=rnorm(20),binom=rbinom(20,10,0.5),
chisq=rchisq(20,10))
> rand.df2
norm binom chisq
1 0.3485193 50 19.359238
2 1.6454204 41 13.547288
3 1.4330907 53 4.968438
4 -0.8531461 55 4.458559
5 0.8741626 47 2.589351

These data frames have the common variables norm and binom; we
subscript and combine the resulting data frames as follows.

> rbind(rand.df1[,c("norm","binom")],
+ rand.df2[,c("norm", "binom")])
norm binom
1 1.64542042 41
2 1.64542042 44
3 -0.13593118 53
4 0.26271524 34
5 -0.01900051 47
6 0.14986005 41
7 0.07429523 43
8 -0.80310861 38
9 0.47110022 44
10 -1.70465453 45
11 0.34851926 50
12 1.64542042 41
13 1.43309068 53
14 -0.85314606 55
15 0.87416262 47

55
Chapter 3 Data Frames

Warning

Use rbind (and, in particular, rbind.data.frame) only when you have complete data frames, as
in the above example. Do not use it in a loop to add one row at a time to an existing data frame—
this is very inefficient. To build a data frame, write all the observations to a data file and use
read.table to read it in.

You can get basic statistics on individual rows by running any of the
four following functions in S-PLUS:
• rowMeans
• rowSums
• rowVars
• rowStdevs

which return the mean, sum, variance, and standard deviation,


respectively, for the specified row or rows.

Merging Data In many situations, you may have data from multiple sources with
Frames some duplicated data. To get the cleanest possible data set for
analysis, you want to merge or join the data before proceeding with the
analysis. For example, player statistics extracted from Total Baseball
overlap somewhat with player statistics extracted from The Baseball
Encyclopedia. You can use the merge function to join two data frames
by their common data. For example, consider the following made-up
data sets.

> baseball.off
player years.ML BA HR
1 Whitehead 4 0.308 10
2 Jones 3 0.235 11
3 Smith 5 0.207 4
4 Russell NA 0.270 19
5 Ayer 7 0.283 5
> baseball.def
player years.ML A FA
1 Smith 5 300 0.974
2 Jones 3 7 0.990
3 Whitehead 4 9 0.980
4 Russell NA 55 0.963

56
Combining Data Frames

5 Ayer 7 532 0.955

These can be merged by the two columns they have in common using
merge:

> merge(baseball.off, baseball.def)


player years.ML BA HR A FA
1 Ayer 7 0.283 5 532 0.955
2 Jones 3 0.235 11 7 0.990
3 Russell NA 0.270 19 55 0.963
4 Smith 5 0.207 4 300 0.974
5 Whitehead 4 0.308 10 9 0.980

By default, merge joins by the columns having common names in the


two data frames. You can specify different combinations using the by,
by.x, and by.y arguments. For example, consider the data sets
authors and books.

> authors
FirstName LastName Age Income Home
1 Lorne Green 82 1200000 California
2 Loren Blye 40 40000 Washington
3 Robin Green 45 25000 Washington
4 Robin Howe 2 0 Alberta
5 Billy Jaye 40 27500 Washington

> books
AuthorFirstName AuthorLastName Book
1 Lorne Green Bonanza
2 Loren Blye Midwifery
3 Loren Blye Gardening
4 Loren Blye Perennials
5 Robin Green Who_dun_it?
6 Rich Calaway Splus

The data sets have different variable names, but overlapping


information. Using the by.x and by.y arguments to merge, we can
join the data sets by the first and last names:

> merge(authors, books, by.x=c("FirstName", "LastName"),


+ by.y=c("AuthorFirstName", "AuthorLastName"))
FirstName LastName Age Income Home Book
1 Loren Blye 40 40000 Washington Midwifery
2 Loren Blye 40 40000 Washington Gardening

57
Chapter 3 Data Frames

3 Loren Blye 40 40000 Washington Perennials


4 Lorne Green 82 1200000 California Bonanza
5 Robin Green 45 25000 Washington Who_dun_it?

Because the desired “by” columns are in the same position in both
books and authors, we can accomplish the same result more simply
as follows.

> merge(authors, books, by=1:2)

More examples can be found in the merge help file.

Converting You may want to convert an S-PLUS data frame to a matrix. If so,
Data Frames there are three different functions which take a data frame as an
argument and return a matrix whose elements correspond to the
elements of the data frame:
• as.matrix.data.frame
• numerical.matrix
• data.matrix

58
Applying Functions to Subsets of a Data Frame

APPLYING FUNCTIONS TO SUBSETS OF A DATA FRAME


A common operation on data with factor variables is to repeat an
analysis for each level of a single factor, or for all combinations of
levels of several factors. SAS users are familiar with this operation as
the BY statement. In S-PLUS, you can perform these operations using
the by or aggregate function. Use aggregate when you want numeric
summaries of each variable computed for each level; use by when you
want to use all the data to construct a model for each level.
The aggregate function allows you to partition a data frame or a
matrix by one or more grouping vectors, and then apply a function to
the resulting columns. The function must be one that returns a single
value, such as mean or sum. You can also use aggregate to partition a
time series (univariate or multivariate) by frequency and apply a
summary function to the resulting time series.
For data frames, aggregate returns a data frame with a factor variable
column for each group or level in the index vector, and a column of
numeric values resulting from applying the specified function to the
subgroups for each variable in the original data frame.

> aggregate(state.x77[,c("Population", "Area")],


+ by=state.division, FUN = sum)
Group Population Area
1 New England 12187 62951
2 Middle Atlantic 37269 100318
3 South Atlantic 32946 266909
4 East South Central 13516 178982
5 West South Central 20868 427791
6 East North Central 40945 244101
7 West North Central 16691 507723
8 Mountain 9625 856047
9 Pacific 28274 891972

59
Chapter 3 Data Frames

Warning

For most numeric summaries, all variables in the data frame must be numeric. Thus, if we
attempt to repeat the above example with the kyphosis data, using kyphosis as the by variable,
we get an error:

> aggregate(kyphosis, by=kyphosis$Kyphosis, FUN=sum)


Error in Summary.factor(structure(.Data = c(1, 1, ..:
A factor is not a numeric object
Dumped

For time series, aggregate returns a new, shorter time series that
summarizes the values in the time interval given by a new frequency.
For instance you can quickly extract the yearly maximum, minimum,
and average from the monthly housing start data in the time series
hstart:

> aggregate(hstart, nf = 1, fun=max)


1966: 143.0 137.0 164.9 159.9 143.8 205.9 231.0 234.2 160.9
start deltat frequency
1966 1 1
> aggregate(hstart, nf = 1, fun=min)
1966: 62.3 61.7 82.7 85.3 69.2 104.6 150.9 90.6 54.9
start deltat frequency
1966 1 1
> aggregate(hstart, nf = 1, fun=mean)
1966: 99.6 110.2 128.8 125.0 122.4 173.7 198.2 171.5 112.6
start deltat frequency
1966 1 1

The by function allows you to partition a data frame according to one


or more categorical indices (conditioning variables) and then apply a
function to the resulting subsets of the data frame. Each subset is
considered a separate data frame, hence, unlike the FUN argument to
aggregate, the function passed to by does not need to have a numeric
result. Thus, by is useful for functions that work on data frames by
fitting models, for example.

60
Applying Functions to Subsets of a Data Frame

> by(kyphosis, INDICES=kyphosis$Kyphosis, FUN=summary)


kyphosis$Kyphosis:absent
Kyphosis Age Number Start
absent:64 Min.: 1.00 Min.:2.00 Min.: 1.00
present: 0 1st Qu.: 18.00 1st Qu.:3.00 1st Qu.:11.00
Median: 79.00 Median:4.00 Median:14.00
Mean: 79.89 Mean:3.75 Mean:12.61
3rd Qu.:131.00 3rd Qu.:5.00 3rd Qu.:16.00
Max.:206.00 Max.:9.00 Max.:18.00
------------------------------------------------------
kyphosis$Kyphosis:present
Kyphosis Age Number Start
absent: 0 Min.: 15.00 Min.: 3.000 Min.: 1.000
present:17 1st Qu.: 73.00 1st Qu.: 4.000 1st Qu.: 5.000
Median:105.00 Median: 5.000 Median: 6.000
Mean: 97.82 Mean: 5.176 Mean: 7.294
3rd Qu.:128.00 3rd Qu.: 6.000 3rd Qu.:12.000
Max.:157.00 Max.:10.000 Max.:14.000

The applied function supplied as the FUN argument must accept a data
frame as its first argument; if you want to apply a function that does
not naturally accept a data frame as its first argument, you must
define a function that does so on the fly. For example, one common
application of the by function is to repeat model fitting for each level
or combination of levels; the modeling functions, however, generally
have a formula as their first argument. The following call to by shows
how to define the FUN argument to fit a linear model to each level:

> by(kyphosis, list(Kyphosis=kyphosis$Kyphosis,


+ Older=kyphosis$Age>105),
+ function(data)lm(Number~Start,data=data))
Kyphosis:absent
Older:FALSE
Call:
lm(formula = Number~Start, data = data)

Coefficients:
(Intercept) Start
4.885736 -0.08764492
Degrees of freedom: 39 total; 37 residual
Residual standard error: 1.261852

61
Chapter 3 Data Frames

Kyphosis:present
Older:FALSE
Call:
lm(formula = Number~Start, data = data)

Coefficients:
(Intercept) Start
6.371257 -0.1191617
Degrees of freedom: 9 total; 7 residual
Residual standard error: 1.170313

Kyphosis:absent
Older:TRUE
. . .

As in the above example, you should define your FUN argument


simply. If you need additional parameters for the modeling function,
specify them fully in the call to the modeling function, rather than
attempting to pass them in through a “...” argument.

Warning

Again, as with aggregate, you need to be careful that the function you are applying by to works
with data frames, and often you need to be careful that it works with factors as well. For example,
consider the following two examples.

> by(kyphosis, kyphosis$Kyphosis, function(data)


+ apply(data,2,mean))
kyphosis$Kyphosis:absent
Kyphosis Age Number Start
NA NA 3.75 12.60938

kyphosis$Kyphosis:present
Kyphosis Age Number Start
NA 97.82353 5.176471 7.294118
Warning messages:
1: 64 missing values generated coercing from character to
numeric in: as.double(x)
2: 17 missing values generated coercing from character to
numeric in: as.double(x)

62
Applying Functions to Subsets of a Data Frame

> by(kyphosis, kyphosis$Kyphosis, function(data)


+ apply(data,2,max))
Error in FUN(x): Numeric summary undefined for mode
"character"
Dumped

The functions mean and max are not very different, conceptually. Both
return a single number summary of their input, both are only
meaningful for numeric data. Because of implementation differences,
however, the first example returns appropriate values and the second
example dumps. However, when all the variables in your data frame
are numeric, or when you want to use by with a matrix, you should
encounter few difficulties.

> dimnames(state.x77)[[2]][4] <- "Life.Exp"


> by(state.x77[,c("Murder", "Population", "Life.Exp")],
+ state.region, summary)
INDICES:Northeast
Murder Population Life.Exp
Min. : 2.400 Min. : 472 Min. :70.39
1st Qu.: 3.100 1st Qu.: 931 1st Qu.:70.55
Median : 3.300 Median : 3100 Median :71.23
Mean : 4.722 Mean : 5495 Mean :71.26
3rd Qu.: 5.500 3rd Qu.: 7333 3rd Qu.:71.83
Max. :10.900 Max. :18080 Max. :72.48

INDICES:South
Murder Population Life.Exp
Min. : 6.20 Min. : 579 Min. :67.96
1st Qu.: 9.25 1st Qu.: 2622 1st Qu.:68.98
Median :10.85 Median : 3710 Median :70.07
Mean :10.58 Mean : 4208 Mean :69.71
3rd Qu.:12.27 3rd Qu.: 4944 3rd Qu.:70.33
Max. :15.10 Max. :12240 Max. :71.42
. . .

Closely related to the by and aggregate functions is the tapply


function, which allows you to partition a vector according to one or
more categorical indices. Each index is a vector of logical or factor
values the same length as the data vector; to use more than one index
create a list of index vectors.

63
Chapter 3 Data Frames

For example, suppose you want to compute a mean murder rate by


region. You can use tapply as follows.

> tapply(state.x77[,"Murder"], state.region, mean)


Northeast South North Central West
4.722222 10.58125 5.275 7.215385

To compute the mean murder rate by region and income, use tapply
as follows.

> income.lev <- cut(state.x77[,"Income"],


+ summary(state.x77[,"Income"])[-4])
> income.lev
[1] 1 4 3 1 4 4 4 3 4 2 4 2 4 2 3 3 1
[18] 1 1 4 3 3 3 NA 2 2 2 4 2 4 1 4 1 4
[35] 3 1 3 2 3 1 2 1 2 2 1 3 4 1 2 3
attr(, "levels"):
[1] "3098+ thru 3993" "3993+ thru 4519"
[3] "4519+ thru 4814" "4814+ thru 6315"

> tapply(state.x77[,"Murder"],list(state.region,
+ income.lev),mean)
3098+ thru 3993 3993+ thru 4519
Northeast 4.10000 4.700000
South 10.64444 13.050000
North Central NA 4.800000
West 9.70000 4.933333
4519+ thru 4814 4814+ thru 6315
Northeast 2.85 6.40
South 7.85 9.60
North Central 5.52 5.85
West 6.30 8.40

64
Adding New Classes of Variables to Data Frames

ADDING NEW CLASSES OF VARIABLES TO DATA FRAMES


The manner in which objects of a particular data type are included in
a data frame is determined by that type’s method for the generic
function as.data.frame. The default method for this generic function
uses the data.class function to determine an object’s type. Thus, even
data types without formal class attributes, such as vectors, or
character vectors, can have specific methods. The behavior for most
built-in types is derived from one of the six basic cases shown in the
table below.

Table 3.1: Rules for combining objects into data frames.

Data Types Sub-types Rules

vector numeric 1. contribute a single variable as is


complex
factor
ordered
rts
its
cts

character character 1. converted to a factor data type


logical
category 2. contribute a single variable

matrix matrix 1. each column creates a separate variable.


2. column names used for variable names

list list 1. each component creates one or more separate


variables
2. variable names assigned as appropriate for
individual components (column names for
matrices, etc.)

model.matrix model.matrix 1. object becomes a single variable in result

data.frame data.frame 1. each variable becomes a variable in result


design design.
2. variable names used for variable names

65
Chapter 3 Data Frames

As you add new classes, you can ensure that they are properly
behaved in data frames by defining your own as.data.frame method
for each new class. In most cases, you can use one of the six paradigm
cases, either as is or with slight modifications. For example, the
character method is a straightforward modification of the vector
method:

> as.data.frame.character
function(x, row.names = NULL, optional = F,
na.strings = "NA", ...)
as.data.frame.vector(factor(x,exclude =na.strings),
row.names,optional)

This method converts its input to a factor, then calls the function
as.data.frame.vector.

You can create new methods from scratch, provided they have the
same arguments as as.data.frame.

> as.data.frame
function(x, row.names = NULL, optional = F, ...)
UseMethod("as.data.frame")

The argument “..." allows the generic function to pass any method-
specific arguments to the appropriate method.
If you’ve already built a function to construct data frames from a
certain class of data, you can use it in defining your as.data.frame
method. Your method just needs to account for all the formal
arguments of as.data.frame. For example, suppose you have a class
loops and a function make.df.loops for creating data frames from
objects of that class. You can define a method as.data.frame.loops
as follows.

> as.data.frame.loops
function(x, row.names = NULL, optional = F, ...)
{
x <- make.df.loops(x, ...)
if(!is.null(row.names))
{ row.names <- as.character(row.names)
if(length(row.names) != nrow(x))
stop(paste("Provided", length(row.names),
"names for", nrow(x), "rows"))
attr(x, "row.names") <- row.names

66
Adding New Classes of Variables to Data Frames

}
x
}

This method takes account of user-supplied row names, but ignores


the argument optional, a flag that is TRUE when the method is not
expected to generate non-trivial row names or variable names for a
calling function.

67
Chapter 3 Data Frames

DATA FRAME ATTRIBUTES


Data frames, like all data objects, have the implicit attributes "length"
and "mode". Because data frames are represented internally as lists,
they have mode "list" and length equal to their number of variables,
which is the number of components of their list representation.
Additional attributes of a data frame can be examined by calling the
attributes function:

> attributes(auto)
$names:
[1] "Price" "Country" "Reliab" "Mileage" "Type"

$row.names:
[1] "AcuraIntegra4" "Audi1005" "BMW325i6"
[4] "ChevLumina4" "FordFestiva4" "Mazda929V6"
[7] "MazdaMX-5Miata" "Nissan300ZXV6" "OldsCalais4"
[10] "ToyotaCressida6"

$class:
[1] "data.frame"

The variable names are stored in the names attribute and the row
names are stored in the rownames attribute. There is also a class
attribute with value data.frame. All data frames have class attribute
data.frame.

Data frames preserve most attributes of special types of vectors, and


these attributes may be accessed after the original objects have been
combined into data frames. For example, categorical data have class
and levels attributes preserved in data frames. You can access the
defining attributes of a particular variable by specifying the variable
in the data frame and passing it to the attributes function. Many of
the variables in the cu.summary data frame are categorical—for
example, the country of manufacture.

> attributes(cu.summary[,"Country"])
$levels:
[1] "Brazil" "England" "France" "Germany"
[5] "Japan" "Japan/USA" "Korea" "Mexico"
[9] "Sweden" "USA"

68
Data Frame Attributes

$class:
[1] "factor"

The levels attribute is as you would expect for a categorical variable.


Additionally, there is a class attribute with a value of factor. Objects
of class factor are discussed in the section Factors and Ordered
Factors (page 37). One attribute that is not preserved is the names
attribute; the names for each variable are taken to be the row names
of the data frame.
The attributes of a data frame are summarized in the table below. For
attributes associated with a particular variable in a data frame, see the
attribute section for the corresponding object type.

Table 3.2: Attributes of Data Frames.

Attribute Description

"length" The number of variables in the data frame.

"mode" All data frames are of mode "list"

"names" The names of the variables (columns) in the


data frame.

"row.names" The names of the rows in the data frame.

"class" All data frames are of class "data.frame".

69
Chapter 3 Data Frames

70
WRITING FUNCTIONS IN
S-PLUS

Introduction
473
The Structure of Functions 75
Function Names and Operators 75
Arguments 78
The Function Body 78
Return Values and Side Effects 78
Elementary Functions 80
Operations on Complex Numbers 84
Summary Functions 85
Comparison and Logical Operators 86
Assignments 89
Testing and Coercing Data 91
Operating on Subsets of Data 94
Subscripting Vectors 94
Subscripting Matrices and Arrays 98
Subscripting Lists 102
Subscripting Data Frames 105
Organizing Computations 107
Programming Style 107
Flow of Control 108
Notes Regarding Commented Code 120
Specifying Argument Lists 121
Formal and Actual Names 121
Specifying Default Arguments 122
Handling Missing Arguments 122
Lazy Evaluation 123
Variable Numbers of Arguments 124
Required and Optional Arguments 125

71
Chapter 4 Writing Functions in S-PLUS

Error Handling 127


Input and Output 130
Data Input 130
Data Output 130
Connections 143
Raw Data Objects 154
Wrap-Up Actions 158
Writing Special Functions 162
Operators 162
Extraction and Replacement Functions 163
References 169

72
Introduction

INTRODUCTION
Programming in S-PLUS consists largely of writing functions. The
simplest functions arise naturally as shorthand for frequently-used
combinations of S-PLUS expressions.
For example, consider the interquartile range, or IQR, of a data set.
Given a collection of data points, the IQR is the difference between
the upper and lower (or third and first) quartiles of the data. Although
S-PLUS has no built-in function for calculating the IQR, it does have
functions for computing quantiles and differences of numeric vectors.
The following two commands define and test a function that returns
the IQR of a numeric vector.

> iqr <- function(x) { diff(quantile(x, c(0.25, 0.75))) }


> iqr(lottery.payoff)

75%
169.75

You can build more complicated functions either by adding new


features incrementally to simpler functions, or by designing whole
programs from scratch. As your functions grow more complex,
proper use of programming features becomes more important.
This chapter describes the basic techniques for writing functions in
S-PLUS. It first outlines the structure underlying all S-PLUS functions,
and then describes some of the most useful functions for manipulating
data. A section on organizing computations gives tips on designing
functions that take advantage of the strengths of S-PLUS. Later
sections introduce techniques for argument handling, error handling,
input and output, and wrap-up actions. From these few simple tools
and techniques, you can build many useful functions.
To run the examples in this chapter, you will need to create functions
with an editor. There are many different approaches to editing
functions in S-PLUS, but the simplest way to get started is with the
Edit function. The built-in function Edit creates a function template
with the proper structure when called with a name that does not
correspond to an existing S-PLUS object. Thus, to create a new
function called newfunc, call Edit as follows:

73
Chapter 4 Writing Functions in S-PLUS

> Edit(newfunc)

Edit the template as desired in the Script window that appears. To


source in the modified function, select Script  Run from the menu,
press the F10 key, or use the Run button on the Script toolbar.
To edit an existing function, call Edit using the function’s name.
Alternatively, right-click on the function’s name in the Object
Explorer and select Edit from the context-sensitive menu. Refer to
the section Editing Objects (page 14) for more details.

74
The Structure of Functions

THE STRUCTURE OF FUNCTIONS


All S-PLUS functions have the same structure: they consist of the
reserved word function, an argument list which may be empty, and a
body. In this section, we discuss these components in detail. In
addition, we discuss programming concepts such as return values,
side effects, and coercion. For completeness, we also include sections
on elementary functions, complex operations, and logical operators.

Function Most functions are associated with names when they are defined. The
Names and form of the name conveys some important information about the
nature of the function. Most functions have simple, relatively short,
Operators alphanumeric names that begin with a letter, such as plot,
na.exclude, or anova. These functions are always used in the form
function.name(arglist).

Operators are special functions for performing mathematical or logical


operations on one or two arguments. They are most convenient to use
in infix form, in which they appear between two arguments. Familiar
examples of operators are +, -, and *. The names of such functions
consist of the symbol used to represent them enclosed by double
quotes. Thus, "+" is the function name corresponding to the addition
operator +. You can use names to call operators as functions in the
ordinary way. For example, the call "+"(2,3) is represented by 2+3 in
infix form; both commands return the number 5.
A complete list of built-in operators is provided in Table 4.1. In
addition to the predefined operators in the table, S-PLUS allows you to
write your own infix operators. For more details, see the section
Operators (page 162).
Operators listed higher in Table 4.1 have higher precedence than
those listed below. Operators on the same line in the table have equal
precedence, and evaluation proceeds from left to right when more
than one of these operators appear in an expression. For example,
consider the command:

> 7 + 5 - 8^2 / 19 * 2
[1] 5.263158

75
Chapter 4 Writing Functions in S-PLUS

Here, the exponentiation is done first, 8^2=64. Division has the same
precedence as multiplication, but appears to the left of the
multiplication in the expression. Therefore, it is performed first:
64/19=3.368421. Next comes the multiplication:
3.368421*2=6.736842. Finally, S-PLUS performs the addition and
subtraction: 7+5-6.736842=5.263158.
You can override the normal precedence of operators by grouping
with parentheses or curly braces:

> (7 + 5 - 8^2) / (19 * 2)


[1] -1.368421

The integer divide operator in S-PLUS, %/%, produces an integral


quotient. For two numbers a and b , the S-PLUS expression a%/%b
computes q in Euclid’s algorithm: a = qb + r where 0 ≤ r < b . The
modulus operator %% computes the remainder r .
Table 4.1: Precedence of operators. Operators listed higher in the table have higher
precedence than those listed below, and operators on the same line have equal
precedence.

Operator Use

$ component selection

@ slot selection

[ [[ subscripts, elements

^ exponentiation

- unary minus

: sequence operator

%% %/% %*% modulus, integer divide, matrix multiply

* / multiply, divide

+ - ? add, subtract, help

76
The Structure of Functions

Table 4.1: Precedence of operators. Operators listed higher in the table have higher
precedence than those listed below, and operators on the same line have equal
precedence.

Operator Use

< > <= >= == != comparison

! not

& | && || logical and, logical or

~ formulas

<<- permanent assignment

<- -> _ = assignments

Note

When using the ^ operator, the exponent must be an integer if the base is a negative number. If
you require a complex result when the base is negative, be sure to coerce it to mode "complex".
See the section Operations on Complex Numbers (page 84) for more details.

Another special type of function is the replacement or left-side function.


It has the appearance of an ordinary function on the left side of an
assignment. For example, the expression dim(x) <- c(3,4) uses the
replacement function "dim<-". S-PLUS interprets this expression as
the ordinary assignment x <- "dim<-"(x,c(3,4)). The function
"dim<-" is the replacement function corresponding to the ordinary
function dim.
Replacement functions can be defined for extraction functions, which
are functions designed to return some specific portion or attribute of a
data object. Common extraction functions are the subscript operator
[], the dim function, and the names function. For details, see the
online help files for these functions and the section Extraction and
Replacement Functions (page 163).

77
Chapter 4 Writing Functions in S-PLUS

Arguments Arguments to a function specify the data to be operated on, and also
pass processing parameters to the function. Not all functions accept
arguments. For example, the date function can only be called with
the syntax date():

> args(date)
function()

In contrast, the lm function accepts many arguments:

> args(lm)
function(formula, data, weights, subset, na.action,
method = "qr", model = F, x = F, y = F, contrasts = NULL,
...)

Functions without arguments are, by design, rigid and single-purpose.


Their behavior can be modified only by editing the function.
Arguments allow you to build multi-purpose functions with behavior
that can be easily modified whenever a function is called. For a
complete discussion of allowable argument lists, see the section
Specifying Argument Lists (page 121)

The Function The body of a function is the part that actually does the work. It
Body consists of a sequence of S-PLUS statements and expressions. If there
is more than one expression, the entire body must be enclosed in
braces. Whether braces should always be included is a matter of
programming style; we recommend including them in all of your
functions because it makes maintenance less accident-prone. By
adding braces when you define a single-line function, you ensure they
won’t be forgotten when you add functionality to it.
Most of this chapter (and, in fact, most of this book) is devoted to
showing you how to write the most effective function body possible.
This involves organizing the computations efficiently and naturally,
expressing them with suitable S-PLUS expressions, and returning the
appropriate information.

Return Values Functions are designed to accomplish something, and if everything


and Side goes as planned, a function accomplishes something every time it is
called. Most functions do one thing: return a value. A return value can
Effects be any valid S-PLUS expression, although it is usually a transformed

78
The Structure of Functions

version of the input data. In general, values returned from functions


are not automatically saved. Therefore, most calls to functions also
involve an assignment:

> y <- f(x)

In this expression, the return value from the function f on the input x
is preserved in the object y for further analysis.

Note

In compiled languages such as C and Fortran, you can pass arguments directly to a function that
modifies the argument values in memory. In S-PLUS however, all arguments are passed by value.
This means that only copies of the arguments are modified throughout the body of a function.

Sometimes, you may want a function to do something besides return


an S-PLUS expression. For instance, you may want to print something,
draw a graph, or change some S-PLUS session options. Because the
main goal of functions is to return values, these other actions are
collectively called side effects. The section Data Output (page 130)
discusses return values and side effects in more detail.
The combination of a function’s side effects and its return value can
be used to good advantage in some situations. For example, the
options function has the side effect of changing the current S-PLUS
session options. It also returns a value that consists of the options in
effect before the current call. Thus, you can use options within a
function not only to change the options in effect, but also to save the
old options for restoration when the function exits. The following
commands illustrate this:

options.old <- options(width=55)


on.exit(options(options.old))

By assigning the return value of options to options.old, we save the


old width setting. The side effect of the first command changes
options to use a width of 55 characters; this takes place whether or
not we assign the return value. The on.exit function performs a
given set of actions when the calling function exits. In this example,
on.exit restores the old width value at the end of the calling function.

79
Chapter 4 Writing Functions in S-PLUS

Elementary In addition to the infix operators introduced in the section Function


Functions Names and Operators (page 75), S-PLUS includes a variety of
elementary mathematical functions that act in a vectorized way on
numeric data sets. That is, the functions manipulate numeric vectors
the same way as single numeric elements. The elementary functions
include the familiar trigonometric and exponential functions, as well
as several functions for computing numerical results.
The functions listed in Table 4.2 are the vectorized math functions
implemented internally as part of the S-PLUS language. S-PLUS has
many other built-in mathematical functions, some of which are
written wholly in the S-PLUS language and some of which are written
to take advantage of existing algorithms in Fortran or C. See Chapter
36, Mathematical Computing in S-Plus, in the Guide to Statistics,
Volume 2 for more information.
Table 4.2: Common elementary mathematical functions.

Name Operation

sort, rev the input sorted in ascending or reverse


order

sqrt square root

abs absolute value

sin, cos, tan trigonometric functions

asin, acos, atan inverse trigonometric functions

sinh, cosh, tanh hyperbolic trigonometric functions

asinh, acosh, atanh inverse hyperbolic trigonometric functions

exp, log exponential and natural logarithm (base e)

log10 common logarithm (base 10)

logb logarithm for bases other than e and 10

80
The Structure of Functions

Table 4.2: Common elementary mathematical functions.

Name Operation

gamma, lgamma gamma function and its natural logarithm

ceiling closest integer not less than the input

floor closest integer not greater than the input

trunc closest integer between the input and zero

round closest integer to the input

signif the input rounded to a specified number of


significant digits

cummax, cummin cumulative maximum and minimum

cumsum, cumprod cumulative sum and product

pmax, pmin parallel maximum and minimum

Examples Each function in Table 4.2 acts element-by-element on its argument.


For example:

> M <- matrix(c(12,2,19,15,9,14,6,2,11,10,7,19), nrow=3)


> M

[,1] [,2] [,3] [,4]


[1,] 12 15 6 10
[2,] 2 9 2 7
[3,] 19 14 11 19

> sqrt(M)

[,1] [,2] [,3] [,4]


[1,] 3.464102 3.872983 2.449490 3.162278
[2,] 1.414214 3.000000 1.414214 2.645751
[3,] 4.358899 3.741657 3.316625 4.358899

81
Chapter 4 Writing Functions in S-PLUS

> tan(M)

[,1] [,2] [,3] [,4]


[1,] -0.6358599 -0.8559934 -0.2910062 0.6483608
[2,] -2.1850399 -0.4523157 -2.1850399 0.8714480
[3,] 0.1515895 7.2446066 -225.9508465 0.1515895

Note that both sqrt(M) and tan(M) return objects that are the same
shape as M. The element in the ith row and jth column of the matrix
returned by sqrt(M) is the square root of the corresponding element
in M. Likewise, the element in the ith row and the jth column of
tan(M) is the tangent of the corresponding element (assumed to be in
radians).
The trunc function acts like floor for elements greater than 0 and
like ceiling for elements less than 0:

> y <- c(-2.6, 1.5, 9.7, -1.0, 25.7, -4.6, -7.5, -2.7, -0.6,
+ -0.3, 2.8, 2.8)
> y
[1] -2.6 1.5 9.7 -1.0 25.7 -4.6 -7.5 -2.7 -0.6
[10] -0.3 2.8 2.8

> trunc(y)
[1] -2 1 9 -1 25 -4 -7 -2 0 0 2 2

> ceiling(y)
[1] -2 2 10 -1 26 -4 -7 -2 0 0 3 3

> floor(y)
[1] -3 1 9 -1 25 -5 -8 -3 -1 -1 2 2

The round function accepts an optional argument digits that allows


you to specify how many digits to include after the decimal point:

> round(sqrt(M), digits=3)

[,1] [,2] [,3] [,4]


[1,] 3.464 3.873 2.449 3.162
[2,] 1.414 3.000 1.414 2.646
[3,] 4.359 3.742 3.317 4.359

The section Formatting Output (page 131) provides examples that


further illustrate the round function.

82
The Structure of Functions

Integer By default, S-PLUS performs integer arithmetic if all arguments are


Arithmetic integers, and real arithmetic if any arguments are real. In particular, if
you pass an integer argument to a built-in function, S-PLUS attempts
to return a integer value. If an integer value cannot be computed for
the expression, S-PLUS returns NA. Earlier versions of S-PLUS
automatically coerced integers to real numbers for storage purposes
and performed real arithmetic by default. This changed in S-PLUS 5.x,
however, and now the coercion must be done explicitly.
For example, here is the code for an S-PLUS function that computes
the factorial of a number. We discuss this function in more detail in
the section Wrap-Up Actions (page 158):

fac1024 <- function(n)


{
old <- options(expressions = 1024)
on.exit(options(old))
if(n <= 1) { return(1) }
else { n * Recall(n-1) }
}

If we call fac1024 with n=12 it works fine, but n=13 causes it to return
NA:

> fac1024(12)
[1] 479001600

> fac1024(13)
[1] NA

This is because S-PLUS attempts to compute an integer value for 13!


and overflows in the process. To force S-PLUS to compute real
solutions, you must coerce the argument to a real number as follows:

> fac1024(13.0)
[1] 6227020800

Alternatively, we can replace the third line in the body of fac1024 so


that it always performs real arithmetic:

if(n <= 1) { return(1.0) }

With the function defined like this, the call fac1024(13) finishes
without overflowing.

83
Chapter 4 Writing Functions in S-PLUS

Operations on You represent complex literals in S-PLUS as a sum of the form a + b i,


Complex where a and b are real numbers. In general, arithmetic operations on
complex numbers work as you would expect. Because the addition
Numbers and subtraction operators have lower precedence than the *, /, and ^
operators, though, you must use parentheses to group complex
arguments in most cases:

> (2-3i)*(4+6i)
[1] 26+0i

> (2+3i)^(3+2i)
[1] 4.714144-4.569828i

Warning

Do not leave any space between the real number b and the symbol i when defining complex
numbers. If space is included between b and i, the following syntax error is returned:
Problem: Syntax error: illegal name ("i")

By default, S-PLUS performs real arithmetic if all arguments are real,


and complex arithmetic if any arguments are complex. In particular,
if you pass a real argument to a built-in function, S-PLUS attempts to
return a real value. If a real value cannot be computed for the
expression, S-PLUS returns NA and issues a domain error message. For
example, here is the result when we pass the real number -1 to the
built-in square root function sqrt:

> sqrt(-1)
[1] NA

To force S-PLUS to consider complex solutions, you must coerce the


arguments to mode "complex", typically by using the function as:

> sqrt(as(-1, "complex"))


[1] 6.123032e-017+1i

Note that the real part of the result, 6.123032e-017, is essentially


equal to zero. Thus, (to machine precision) S-PLUS returns 1i as the
square root of – 1 , which is what we expect. Alternatively, you can
include a zero-valued imaginary part to coerce real numbers to mode
"complex":

84
The Structure of Functions

> sqrt(-1+0i)
[1] 6.123032e-017+1i

In addition to the ordinary operators and elementary mathematical


functions, S-PLUS provides five special operators for manipulating
complex numbers: Re, Im, Mod, Arg, and Conj. The Re and Im functions
extract the real and imaginary parts, respectively, from a complex
number. For example:

> x <- as(-3, "complex")


> x^(1/3)
[1] 0.7211248+1.249025i

> Re(x^(1/3))
[1] 0.7211248

> Im(x^(1/3))
[1] 1.249025

The Conj function returns the conjugate of a complex number:

> Conj(x^(1/3))
[1] 0.7211248-1.249025i

The Mod and Arg functions return the modulus and argument,
respectively, for the polar representation of a complex number:

> Mod(2 + 2i)


[1] 2.828427

> Arg(2 + 2i)


[1] 0.7853982

Summary The mathematical operators and functions introduced so far act


Functions element-by-element, generally returning a value the same length and
mode as the input data. S-PLUS also includes a number of functions
for summarizing data. Summary functions accept an input vector or
matrix and return a single value that summarizes the data in some
way. For example, the sum and prod functions return the sum and
product, respectively, of their arguments. Other useful summary
functions are listed in Table 4.3. For details on any of the functions
listed in the table, see the online help or Chapter 4, Descriptive
Statistics, in the Guide to Statistics, Volume 1.

85
Chapter 4 Writing Functions in S-PLUS

Table 4.3: Common functions for summarizing data.

Name Operation

min, max Return the smallest and largest values of the input arguments.

range Returns a vector of length two containing the minimum and maximum
of all the elements in all the input arguments.

mean, median Return the arithmetic mean and median of the input arguments. The
optional trim argument to mean allows you to discard a specified
fraction of the largest and smallest values.

var Returns the variance of a vector, the variance-covariance of a matrix,


or covariances between matrices or vectors.

stdev Returns the standard deviation of a numeric vector.

quantile Returns user-requested sample quantiles for a given data set. For
example,
> quantile(corn.rain, c(0.25, 0.75))
25% 75%
9.425 12.075

mad Returns the median absolute deviation of a numeric vector.

cor Returns the correlation matrix of a data matrix, or correlations


between matrices or vectors.

skewness, Return the skewness and kurtosis of a numeric vector.


kurtosis

summary Returns the minimum, maximum, first and third quartiles, mean, and
median of a numeric vector.

Comparison Table 4.4 lists the S-PLUS operators for comparison and logic.
and Logical Comparisons and logical operations are frequently convenient for
such tasks as extracting subsets of data. In addition, conditionals using
Operators

86
The Structure of Functions

logical comparisons play an important role in the flow of control in


functions, as we discuss in the section Organizing Computations
(page 107).
Table 4.4: Logical and comparison operators.

Operator Explanation Operator Explanation

== equal to != not equal to

> greater than < less than

>= greater than or equal to <= less than or equal to

& vectorized AND | vectorized OR

&& control AND || control OR

! not

Notice that S-PLUS has two types of logical operators for AND and
OR operations. Table 4.4 refers to the two types as “vectorized” and
“control.” The vectorized operators evaluate AND and OR expressions
element-by-element, returning a logical vector containing TRUE and
FALSE as appropriate. For example:

> x <- c(1.9, 3.0, 4.1, 2.6, 3.6, 2.3, 2.8, 3.2, 6.6,
+ 7.6, 7.4, 1.0)
> x
[1] 1.9 3.0 4.1 2.6 3.6 2.3 2.8 3.2 6.6 7.6 7.4 1.0

> x<2 | x>4


[1] T F T F F F F F T T T T

> x>2 & x<4


[1] F T F T T T T T F F F F

In contrast, the control operators are used to construct conditional


statements in if or else statements. The expressions in such
statements are expected to have a single logical value, rather than a
vector of logical values.

87
Chapter 4 Writing Functions in S-PLUS

The control operators have the additional property that they are
evaluated only as far as necessary to return a correct value. For
example, consider the following expression for some numeric vector
y:

> any(x > 1) && all(y < 0)

The any function evaluates to TRUE if any of the elements in any of its
arguments are true; it returns FALSE if all of the elements are false.
Likewise, the all function evaluates to TRUE if all of the elements in all
of its arguments are true; it returns FALSE if there are any false
elements. S-PLUS initially evaluates only the first condition in the
above expression, any(x > 1). After determining that x > 1 for some
element in x, only then does S-PLUS proceed to evaluate the second
condition, all(y < 0).
Similarly, consider the following command:

> all(x >= 1) || 2 > 7


[1] T

S-PLUS stops evaluation with all(x >= 1) and returns TRUE, even
though the statement 2 > 7 is false. Because the first condition is true,
so is the entire expression.
Logical comparisons involving the symbolic constants NA and NULL
always return NA, regardless of the type of operator used. For
example:

> y <- c(3, NA, 4)


> y
[1] 3 NA 4

> y > 0
[1] T NA T

> all(y > 0)


[1] NA

To test whether a value is missing, use the function is.na:

> is.na(y)
[1] F T F

To test whether a component of a list or an attribute of an object is


null, use the is.null function:

88
The Structure of Functions

> is.null(names(kyphosis))
[1] F

> is.null(names(letters))
[1] T

For more details on functions such as is.na and is.null, see the
section Testing and Coercing Data (page 91).

Assignments As we have mentioned, data objects are created in S-PLUS by


assigning values to names. We saw in the section Syntax of S-PLUS
Expressions (page 7) that legal names consist of letters, numbers, and
periods, and cannot begin with a number. The most common form of
assignment in S-PLUS uses the left assignment operator <-, which may
also be written as the equals sign = or a single underscore _ to save
typing. The standard syntax is one of three forms:
• name <- expression
• name = expression
• name _ expression

S-PLUS interprets the expression on the right side of the assignment


operator and returns a value. The value is then assigned to the name
on the left side of the operator.
Because the underscore is an S-PLUS assignment operator, it is
extremely important to remember that it cannot be used in function
and object names, unlike in many other languages. In addition, it is
deprecated as an assignment operator, so it may not be supported in
future releases of S-PLUS. See the section Syntax of S-PLUS
Expressions (page 7) for more details.

89
Chapter 4 Writing Functions in S-PLUS

Warning

In addition to object assignments, the equals sign is used for argument assignments within a
function definition. Because of this, there are some ambiguities that you must be aware of when
using the equals sign as an assignment operator. For example, the command
> print(x <- myfunc(y))
assigns the value from myfunc(y) to the object x and then prints x. Conversely, the command
> print(x = myfunc(y))
simply prints the value of myfunc(y) and does not perform an assignment. This is because the
print function has an argument named x, and argument assignment takes precedence over
object assignment with the equals sign. Because of these ambiguities, we discourage the use of the
equals sign for left assignment.

Assignments made at the S-PLUS prompt are performed in the current


working directory. Assignments within functions are local, and are
performed in the frame in which the function is evaluated. This means
that you can freely assign values to names within functions without
overwriting existing objects that might share the same name. Frames
are discussed in full in the section Frames and Argument Evaluation
(page 889).
Equivalent to the left assignment operator is right assignment operator,
which appears in the form expression -> name . Right assignment is
convenient when you type a complicated expression at the S-PLUS
prompt and then realize you’ve forgotten to assign a name to the
return value. S-PLUS also protects you from such forgetfulness by
storing the last unassigned value in the .Last.value object in your
working data directory. For consistency, we recommend that you
always use left assignment within functions. If you use right
assignment in a function definition and then view the code later, you
will see that S-PLUS automatically reformats the function to use left
assignment.
The permanent assignment operator <<- operator is like <-, except that it
always writes to the working directory. Thus, it allows you to make
permanent assignments from within functions. However, permanent
assignment inside a function produces a side effect, in that objects in

90
The Structure of Functions

your working data directory are overwritten if they exist. This can
lead to lost data. For this reason, we discourage the use of <<- within
functions.
A more general form of assignment uses the assign function. The
assign function allows you to choose where the assignment takes
place. You can assign an object to either a position in the search list or
a particular frame. For example, the following command assigns the
value 3 to the name boo on the session frame 0:

> assign("boo", 3, frame=0)

The assign function can be used to write to permanent directories. As


with <<-, we discourage such use within functions because permanent
assignments have potentially dangerous side effects.

Testing and Most functions expect input data of a particular type. For example,
Coercing Data mathematical functions expect numeric input while text processing
functions expect character input. Other functions are designed to
work with a wide variety of input data and have internal branches
that use the data type of the input to determine what to do.
Unexpected data types can often cause a function to stop and return
error messages. To protect against this behavior, many functions
include expressions that test whether the input data is of the right type
and coerce the data if necessary. For example, mathematical functions
frequently have conditionals of the following form:

if(!is(x, "numeric")) x <- as(x, "numeric")

This statement tests the input data x with the is function. If x is not
numeric, it is coerced to a numeric object with the as function.
As we discuss in Chapter 1, The S-PLUS Language, older versions of
S-PLUS (S-PLUS 3.x, 4.x, and 2000) were based on version 3 of the S
language (SV3). Most testing of SV3 objects is done with functions
having names of the form is.type, where type is a recognized data
type. For example, the functions is.vector and is.matrix test
whether the data type of an object is a vector and a matrix,
respectively. Functions also exist to test for special values such as NULL
and NA; see the section Comparison and Logical Operators (page 86)
for more information.

91
Chapter 4 Writing Functions in S-PLUS

Coercion of SV3 objects can be performed using functions with


names of the form as.type , such as as.vector and as.matrix.
Coercion using the as.type functions is very strong, however, and
can lead to loss of information; see the section Coercion of Values
(page 23) for a full discussion. If all you need is to ensure that atomic
data is of the proper mode, you can do this explicitly as follows:

> mode(x) <- "type"

For a list of atomic modes, see the help file for the mode function.
Newer versions of S-PLUS (S-PLUS 5.x and later) are based on version
4 of the S language (SV4), which implements a vastly different
approach to classes. In SV4, the is.type and as.type functions are
collapsed into the simpler is and as functions. For example, to test
whether an object x is numeric, type:

> is(x, "numeric")

Similarly, to coerce x to have a character data type, use the following


command:

> as(x, "character")

The is and as functions are backwards compatible and can be used


with data objects created in earlier versions of S-PLUS.
Objects can be tested in a more general way using the inherits
function. For example, if you have a class called myclass, you can test
an object x for membership in the class using inherits as follows:

> inherits(x, "myclass")

For information on classes and inheritance, see Chapter 10, Object-


Oriented Programming in S-PLUS.
Table 4.5 lists the most common testing and coercing functions. The
first column gives the data type and the next two columns list the SV4
testing and coercing functions for the data type. The functions relating
to the three data types single, double, and integer are used to
modify the storage mode of numeric data. The storage mode of data is
important if you need to interface with C or Fortran routines, but can
safely be ignored otherwise.

92
The Structure of Functions

Table 4.5: Common functions for testing and coercing data objects.

Type Testing Coercion

array is(x, "array") as(x, "array")

character is(x, "character") as(x, "character")

complex is(x, "complex") as(x, "complex")

data frame is(x, "data.frame") as(x, "data.frame")

double is(x, "double") as(x, "double")

factor is(x, "factor") as(x, "factor")

integer is(x, "integer") as(x, "integer")

list is(x, "list") as(x, "list")

logical is(x, "logical") as(x, "logical")

matrix is(x, "matrix") as(x, "matrix")

numeric is(x, "numeric") as(x, "numeric")

single is(x, "single") as(x, "single")

vector is(x, "vector") as(x, "vector")

93
Chapter 4 Writing Functions in S-PLUS

OPERATING ON SUBSETS OF DATA


Often, we want to perform calculations on only a subset of a data set.
The most useful method in S-PLUS for acting on a subset of data is
called subscripting. In general, subscripting is good S-PLUS
programming because it treats a data object as a whole rather than as
a collection of elements. In fact, subscripting is much more powerful
in S-PLUS than in other languages, and therefore should be mastered.
For a collection of good S-PLUS programming techniques, see the
section Organizing Computations (page 107).
In the following sections, we illustrate subscripting on a number of
common data structures. For vectors, matrices, and arrays, we use
square brackets [] to subset certain elements; for lists and data
frames, we also use the dollar sign $.

Subscripting A vector is a set of values that can be thought of as a one-dimensional


Vectors array. (Note that this is simply a description, however; an S-PLUS
vector is not equivalent to an S-PLUS one-dimensional array.) A vector
subscript corresponds to an element’s position, or index , in the vector.
For example, the sixth element in a vector x has a subscript (or index)
of 6. You can subscript a data vector by providing a set of indices that
correspond to the elements you wish to keep. If y is a vector of
indices, x[y] returns the elements in x that correspond to the indices.
In S-PLUS, appropriate indices for subscripting vectors are
constructed automatically from information supplied in one of the
four following forms: a vector of positive integers, a vector of negative
integers, a logical vector, and a vector of character strings. We discuss
each of these in detail below. It is important to note that any S-PLUS
expression that evaluates to an appropriate subscript value can be
included in the square brackets. This flexibility makes subscripting a
very powerful programming tool in S-PLUS.

Subscripting with positive integers


If you supply a set of positive integers to subscript a vector, S-PLUS
interprets the integers as the indices of the elements that you want to
keep. To illustrate this, suppose we have a vector x:

> x <- c(1.9, 3.0, 4.1, 2.6, 3.6, 2.3, 2.8, 3.2, 6.6,
+ 7.6, 7.4, 1.0)

94
Operating on Subsets of Data

> x
[1] 1.9 3.0 4.1 2.6 3.6 2.3 2.8 3.2 6.6 7.6 7.4 1.0

The following command returns the third element of x:

> x[3]
[1] 4.1

The next command returns the third, fifth, and ninth elements:

> x[c(3,5,9)]
[1] 4.1 3.6 6.6

Note that the indices do not need to be unique:

> x[c(5,5,8)]
[1] 3.6 3.6 3.2

In addition, the indices do not need to be given in increasing order.


Since x has twelve elements, the following returns x in reverse order:

> x[12:1]
[1] 1.0 7.4 7.6 6.6 3.2 2.8 2.3 3.6 2.6 4.1 3.0 1.9

To determine the total number of elements in a vector, use the


function length. This function returns the number of elements in
atomic objects such as vectors and matrices, and it returns the number
of components in recursive objects such as lists. If the requested index
for a vector x is greater than length(x), S-PLUS returns NA to indicate
a missing value.

Subscripting with negative integers


If you supply a set of negative integers to subscript a vector, S-PLUS
interprets them as the indices of the elements that you want to
exclude from the result. All elements in the original vector are
returned in order, with the exception of those corresponding to the
indices you specify. For example, the following command returns all
elements in x except for the third, fourth, and fifth:

> x[-(3:5)]
[1] 1.9 3.0 2.3 2.8 3.2 6.6 7.6 7.4 1.0

Specifying an index greater than length(x) has no effect:

> x[-13]
[1] 1.9 3.0 4.1 2.6 3.6 2.3 2.8 3.2 6.6 7.6 7.4 1.0

95
Chapter 4 Writing Functions in S-PLUS

The entire vector x is returned by this command, since x has only 12


elements.
Note that you cannot combine positive and negative integers to
subscript a vector. For example, the command x[c(3,-5,9)] returns
an error.

Subscripting with logical values


If you supply a set of logical values to subscript a vector, S-PLUS
interprets the TRUE values as the indices of the elements that you want
to keep. All elements in the original vector are returned in order, with
the exception of those corresponding to the FALSE indices. For
example, the commands below returns all elements in x that are
greater than 2. Equivalently, they return all elements in x for which
the vector x > 2 is TRUE:

> x > 2
[1] F T T T T T T T T T T F

> x[x > 2]


[1] 3.0 4.1 2.6 3.6 2.3 2.8 3.2 6.6 7.6 7.4

The next command returns the elements in x that are between 2 and
4:

> x[x>2 & x<4]


[1] 3.0 2.6 3.6 2.3 2.8 3.2

Logical index vectors are generally the same length as the vectors to
be subscripted. However, this is not a strict requirement, as S-PLUS
recycles the values in a short logical vector so that its length matches a
longer vector. Thus, you can use the following command to extract
every third element from x:

> x[c(F,F,T)]
[1] 4.1 2.3 6.6 1.0

The index vector c(F,F,T) is repeated four times so that its length
matches the length of x. Likewise, the following command extracts
every fifth element from x:

> x[c(F,F,F,F,T)]
[1] 3.6 7.6

96
Operating on Subsets of Data

In this case, the index vector is repeated three times, and no values
are returned for indices greater than length(x).

Subscripting with character values


When you supply a set of character values to subscript a vector, the
values must be from the vector’s names attribute. Thus, this
subscripting technique requires the vector to have a non-null names
attribute. S-PLUS matches the names you specify with those in the
names attribute, and returns the corresponding elements of the vector.

For example, consider the built-in vector state.abb, which contains


the postal abbreviations for all fifty states in the USA. Note that
state.abb has no names by default:

> length(state.abb)
[1] 50

> names(state.abb)
NULL

We can use the "names<-" replacement function to assign names to


state.abb. The names we choose are located in the vector
state.name, which contains the full names of all fifty states:

> length(state.name)
[1] 50

> state.name

[1] "Alabama" "Alaska" "Arizona"


[4] "Arkansas" "California" "Colorado"
[7] "Connecticut" "Delaware" "Florida"
[10] . . .

Before modifying a built-in data object, we must create a local copy of


it in our working directory:

> state.abb <- state.abb


> names(state.abb) <- state.name

Finally, we can subscript state.abb directly with character vectors.


The following command returns the postal abbreviations of Alaska
and Hawaii:

97
Chapter 4 Writing Functions in S-PLUS

> state.abb[c("Alaska", "Hawaii")]

Alaska Hawaii
"AK" "HI"

Subscripting Subscripting data sets that are matrices or arrays is very similar to
Matrices and subscripting vectors. In fact, you can subscript them exactly like
vectors if you keep in mind that arrays are stored in column-major
Arrays order. You can think of the data values in an array as being stored in
one long vector that has a dim attribute to specify the array’s shape.
Column-major order states that the data values fill the array so that
the first index changes the fastest and the last index changes the
slowest. For matrices, this means that data values are filled in column-
by-column.
For example, suppose we have the following matrix M:

> M <- matrix(c(12,1,19,15,9,14,6,2,11,10,7,19), nrow=3)


> M

[,1] [,2] [,3] [,4]


[1,] 12 15 6 10
[2,] 1 9 2 7
[3,] 19 14 11 19

We can extract the eighth element of M as follows:

> M[8]
[1] 2

This corresponds to the element in the second row and third column
of M. When a matrix is subscripted in this way, the element returned is
a single number without dimension attributes. Thus, S-PLUS does not
recognize it as matrix.
S-PLUS also lets you use the structure of arrays to your advantage by
allowing you to specify one subscript for each dimension. Since
matrices have two dimensions, you can specify two subscripts inside
the square brackets. The matrix subscripts correspond to the row and
column indices, respectively:

> M[2,3]
[1] 2

98
Operating on Subsets of Data

As with vectors, array subscripts can be positive integers, negative


integers, logical vectors, or character vectors if appropriate. The
following command returns a 2 × 2 submatrix of M, consisting of the
first and third rows and the second and fourth columns:

> M[c(1,3), c(2,4)]

[,1] [,2]
[1,] 15 10
[2,] 14 19

The next command returns values from the same two columns,
including all rows except the first:

> M[-1, c(2,4)]

[,1] [,2]
[1,] 9 7
[2,] 14 19

The next example illustrates how you can use a logical vector to
subscript a matrix or array. We use the built-in data matrix state.x77,
which contains demographic information on all fifty states in the
USA. The third column of the matrix, Illiteracy, gives the percent
of the population in a given state that was illiterate at the time of the
1970 census. We first copy this column into an object named illit:

> dim(state.x77)
[1] 50 8

> illit <- state.x77[1:50, 3]

Next, we subscript state.x77 on the illit values that are greater


than two:
> state.x77[illit > 2, 3:5]

Illiteracy Life Exp Murder


Alabama 2.1 69.05 15.1
Louisiana 2.8 68.76 13.2
Mississippi 2.4 68.09 12.5
New Mexico 2.2 70.32 9.7
South Carolina 2.3 67.96 11.6
Texas 2.2 70.90 12.2

99
Chapter 4 Writing Functions in S-PLUS

In the above command, the subscript illit > 2 results in a logical


value of length 50. The returned values are in rows for which
illit > 2 is TRUE, and are from the third, fourth, and fifth columns of
state.x77.

It is also possible to subscript matrices and arrays by supplying


character values that specify indices. The supplied values must be
from the array’s dimnames attribute. S-PLUS matches the names you
specify with those in the dimnames attribute and returns the
corresponding elements of the array. For example, the command
below returns the element of state.x77 in the row named Arizona
and the column named Area:

> dimnames(state.x77)

[[1]]:
[1] "Alabama" "Alaska" "Arizona"
[4] "Arkansas" "California" "Colorado"
[7] "Connecticut" "Delaware" "Florida"
[10] . . .

[[2]]:
[1] "Population" "Income" "Illiteracy" "Life.Exp"
[5] "Murder" "HS.Grad" "Frost" "Area"

> state.x77["Arizona", "Area"]


[1] 113417

Note that if the subscript for a given dimension is omitted, all


subscripts are assumed. Thus, we can construct the illit object with
the simpler command:

> illit <- state.x77[, 3]

As with vectors, array subscripts can be any expression that evaluates


to an appropriate set of index values.

Dropping Indices By default, S-PLUS drops array dimensions whenever subscripting


from Arrays results in a lower-dimensional object. Thus, if you subscript a single
column or a single value from a matrix, the returned object is a vector
instead of a matrix. You can see this with the matrix M that we defined
above:

100
Operating on Subsets of Data

> M[1,3]
[1] 6

To override this behavior, use the drop argument to the subscripting


functions. By default, drop=TRUE and dimensions are dropped
whenever possible. The command below sets drop=FALSE and thus
keeps the matrix dimensions:

> K <- M[1, 3, drop=FALSE]


> K

[,1]
[1,] 6

> is(K, "matrix")


[1] T

> dim(K)
[1] 1 1

Subscripting In general, operating on arrays of data is more complicated than


Arrays with operating on simple vectors. One problem, as we discuss above, is
Matrices that subscripting can sometimes collapse your data. Another problem
is that subscripting an n-dimensional array with n subscripts yields
only rectangular data sets. Often, you need to extract more irregular
subsets of arrays. You can do this by supplying a subscript matrix that
represents the positions of the individual elements you wish to keep.
For example, suppose we want to extract two elements of M: the
element in row 1 and column 2, and the element in row 3 and column
3. We can do this directly with the following command:

> c(M[1,2], M[3,3])


[1] 15 11

More generally, we do this by subscripting with the following matrix:

> subscr.mat <- matrix(c(1,2,3,3), ncol=2, byrow=T)


> subscr.mat

[,1] [,2]
[1,] 1 2
[2,] 3 3

> M[subscr.mat]
[1] 15 11

101
Chapter 4 Writing Functions in S-PLUS

Subscript matrices such as subscr.mat have as many columns as there


are dimensions in the array. Each element you want to extract from
the array corresponds to a row in the subscript matrix, and the entries
in each row are the indices of the element.

Subscripting Lists are vectors of class "list" that can hold arbitrary S-PLUS objects
Lists as individual elements. For example:

> x <- c("Tom", "Dick", "Harry")


> mode(x)
[1] "character"

> mylist <- list(x = x)


> mylist[1]
$x:
[1] "Tom" "Dick" "Harry"

> mode(mylist[1])
[1] "list"

When it acts on vectors, the subscript operator [] returns a subvector


that has the same mode as the original vector. Thus, mylist[1] is a
list, just like the original object mylist. Yet the element x that we use
to build mylist is of mode "character". To extract the original
structure of a list element, use the operator [[]]:

> mylist[[1]]
[1] "Tom" "Dick" "Harry"

> mode(mylist[[1]])
[1] "character"

The subscript operator [[]] returns a single element of a vector; the


mode of the element may be different than the mode of the original
vector. Although it works on ordinary numeric and character vectors,
the operator [[]] is most useful on lists. For this reason, we refer to it
as the list subscript operator.
If the subscript for a list is itself a vector or a list, S-PLUS uses it
recursively. That is, the first element of the subscript extracts an
element from the top-level list in the object, the next subscript
element extracts from the first, and so on. For example, consider the
object biglist below:

102
Operating on Subsets of Data

> biglist <- list(


+ lista = list(
+ list1 = list(x=1:10, y=10:20),
+ list2 = letters),
+ listb = list("a", "r", "e"))

> biglist

$lista:
$lista$list1:
$lista$list1$x:
[1] 1 2 3 4 5 6 7 8 9 10

$lista$list1$y:
[1] 10 11 12 13 14 15 16 17 18 19 20

$lista$list2:
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m"
[14] "n" "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "y" "z"

$listb:
$listb[[1]]:
[1] "a"

$listb[[2]]:
[1] "r"

$listb[[3]]:
[1] "e"

Suppose we want to extract the element y from biglist. The lista


element is the first in biglist, the list1 element is the first in lista,
and y is the second element in list1. Thus, the following expression
returns the contents of y:

> biglist[[1]][[1]][[2]]
[1] 10 11 12 13 14 15 16 17 18 19 20

We can accomplish the same thing more compactly with the


following shorthand:

103
Chapter 4 Writing Functions in S-PLUS

> biglist[[c(1,1,2)]]
[1] 10 11 12 13 14 15 16 17 18 19 20

If the elements of a list are named, the named elements are called
components and can be extracted by either the list subscript operator or
the component operator $. For example:

> mylist$x
[1] "Tom" "Dick" "Harry"

Note that the component operator returns the original structure of a


list element, just like the list subscript operator:

> mode(mylist$x)
[1] "character"

You can extract components of embedded lists with nested use of the
component operator:

> biglist$lista$list2

[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m"
[14] "n" "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "y" "z"

You can also supply a vector of component names to the list subscript
operator. The effect is the same as supplying a vector of component
numbers, as in the biglist[[c(1,1,2)]] command above. For
example, the following extracts the list2 component of lista in
biglist:

> biglist[[c("lista","list2")]]

[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m"
[14] "n" "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "y" "z"

The component operator and the list subscript operator are


equivalent. You can use them interchangeably, and you can use both
operators in a single command:

> biglist[["lista"]]$list1

$x:
[1] 1 2 3 4 5 6 7 8 9 10
$y:
[1] 10 11 12 13 14 15 16 17 18 19 20

104
Operating on Subsets of Data

Subscripting Data frames share characteristics of both matrices and lists. Thus,
Data Frames subscripting data frames shares characteristics of subscripting both
matrices and lists. In the examples below, we illustrate the possible
ways that you can use to subscript data frames.
First, we form a data frame from numerous built-in data sets that
contain information on the 50 states in the USA:

> state.data <- data.frame(state.abb, state.center,


+ state.region, state.division, state.x77,
+ row.names = state.name)

The data frame state.data is a matrix with 50 rows and 14 columns,


but S-PLUS also recognizes it as a list. For most subscripting purposes,
it is easiest to treat data frames as matrices and extract elements with
the matrix subscript notation. For example, the following command
returns a 3 × 3 data frame using matrix subscripts:

> state.data[5:7, 6:8]

Population Income Illiteracy


California 21198 5114 1.1
Colorado 2541 4884 0.7
Connecticut 3100 5348 1.1

Like lists, data frames also have components that you can access with
the component operator. Data frame components are the named
columns and can be accessed just like list components. For example,
the following command returns the Population column of
state.data:

> state.data$Population

[1] 3615 365 2212 2110 21198 2541 3100 579


[9] 8277 4931 868 813 11197 5313 2861 2280
[17] 3387 3806 1058 4122 5814 9111 3921 2341
[25] 4767 746 1544 590 812 7333 1144 18076
[33] 5441 637 10735 2715 2284 11860 931 2816
[41] 681 4173 12237 1203 472 4981 3559 1799
[49] 4589 376

You can also supply a vector of component names to subscript


particular columns from a data frame. The following command
returns a data frame containing both the Population and Area
columns from state.data:

105
Chapter 4 Writing Functions in S-PLUS

> state.data[, c("Population", "Area")]

Population Area
Alabama 3615 50708
Alaska 365 566432
Arizona 2212 113417
Arkansas 2110 51945
California 21198 156361
Colorado 2541 103766
Connecticut 3100 4862
. . .

106
Organizing Computations

ORGANIZING COMPUTATIONS
As with any programming task, the key to successful S-PLUS
programming is to organize your computations before you start.
Break the problem into pieces and use the appropriate tools to
complete each piece. Be sure to take advantage of existing functions
rather than writing new code to perform routine tasks.
S-PLUS programming in particular requires one additional bit of
wisdom that is crucial: treat every object as a whole. Treating objects as
whole entities is the basis for vectorized computation. You should
avoid operating on individual observations, as such computations in
S-PLUS carry a high premium in both memory use and processing
time. Operating on whole objects is made simpler by a very flexible
subscripting capability, as we discuss in the previous section. In most
cases where for loops (or other loop constructs) seem the most
natural way to access individual data elements, you will gain
significantly in performance by using some form of subscripting.
In this section, we provide some high-level suggestions for good
S-PLUS programming style. In addition, we discuss common control
functions such as if, ifelse, and return.

Programming Most programmers have been exposed to some general rules of


Style programming style before they attempt to write functions in the
S-PLUS language. “Avoid Goto’s,” “Use top-down design,” and “Keep
it modular” are some catch-phrases that embody style guidelines.
Most of the style guidelines you’ve come to swear by are also
applicable to S-PLUS, including:
• Modularize your code. If you need to design a large function, see
if you can use smaller functions to do most of the work. This
reduces the size of the larger function, which makes it easier
to understand and debug. This approach also allows each of
the smaller functions to be reused for other purposes.
• Comment your code. Comments are useful guides to the design
of a function, particularly when you use an unusual or
unfamiliar construction. Without comments, you may not be
able to decipher your own code in six months, and it may be
completely opaque to anyone else who tries to read it.

107
Chapter 4 Writing Functions in S-PLUS

The S-PLUS language designates comments with the pound


sign #. Be sure to read the section Notes Regarding
Commented Code (page 120) before including comments in
your S-PLUS functions.
• Document your code. If you distribute your functions to other
users, include help files describing them that contain
complete descriptions of arguments and return values. The
prompt function can be used to create a skeletal help file for
any S-PLUS object. For more details, see the sections on
creating help files in Chapter 4, Writing Functions in S-PLUS.
• Use existing functions. If you already know of a function that
performs a certain task, use it instead of rewriting it. S-PLUS
includes over four thousand built-in functions, most of which
can be used to good effect in your own functions.
• Use parentheses to make groupings explicit. If you’re a
sophisticated user, you can use the precedence of operations
to your advantage in writing quick functions. If you plan to
maintain such functions, however, it is clearer to have the
precedence explicit with parentheses. For details, see the
section Function Names and Operators (page 75).
• Avoid unnecessary looping. As we mention above, a key point in
S-PLUS programming is to treat data objects as whole objects.
When you begin to use a for loop, ask yourself if the loop can
be eliminated in favor of a single expression that operates on
the whole object. For atomic objects such as vectors and
matrices, this is almost always possible. Lists, however,
sometimes require for loops to access the individual list
elements. The lapply function performs this looping for you;
you should use it instead of explicitly constructing your own
loops. Chapter 21, Using Less Time and Memory, discusses
several techniques for avoiding loops in S-PLUS code,
including the use of lapply.
Many other useful programming rules can be found in introductory
programming texts.

Flow of Control Normally, S-PLUS expressions are evaluated sequentially. Groups of


expressions can be collected within curly braces {}. Such groups are
treated as a single S-PLUS expression, but within the expression

108
Organizing Computations

evaluation again proceeds sequentially. You can override the normal


flow of control with the constructions presented in Table 4.6. In the
subsections that follow, we discuss each of these in detail.
Table 4.6: S-PLUS constructions that allow you to override the normal flow of control.

Construction Description

if(condition) {expression} Evaluates condition. If true, evaluates expression.

if(condition) {expression1} Evaluates condition. If true, evaluates expression1 . If false,


else {expression2} evaluates expression2 .

ifelse(condition, Vectorized version of the if statement. Evaluates condition


expression1, expression2) and returns elements of expression1 for true values and
elements of expression2 for false values.

switch(expression, ...) Evaluates expression, which must return either a character or


numeric value. The value of expression is compared to the
remaining arguments: if it matches one exactly, the value of the
evaluated argument is returned.

break Terminates the current loop and passes control out of the loop.

next Terminates the current iteration of the loop and immediately


starts the next iteration.

109
Chapter 4 Writing Functions in S-PLUS

Table 4.6: S-PLUS constructions that allow you to override the normal flow of control.

Construction Description

return(expression ) Terminates the current function and immediately returns the


value of expression .

stop(message) Signals an error by terminating evaluation of the current


function, printing the character string contained in message ,
and returning to the S-PLUS prompt.

while(condition ) Evaluates condition. If true, evaluates expression then repeats


{expression} the loop, evaluating condition again.

repeat {expression} Simpler version of the while statement. No tests are performed
and expression is evaluated indefinitely. Because repeat
statements have no natural termination, they should contain
break, return and/or stop statements.

for(name in expression1) Evaluates expression2 once for each name in expression1.


{expression2} Although for loops are widely used in most programming
languages, they are generally less efficient in S-PLUS than
vectorized calculations.

The if and stop The if statement is the most common branching construction in
Statements S-PLUS. The syntax is simple:

if(condition) { expression }

Here, condition is any S-PLUS expression that evaluates to a logical


value, and expression is the expression that is evaluated if condition
is true. As with function bodies, expression needs to be braced only if
it contains multiple statements. We suggest, however, that you include
braces at all times for consistency and maintainability.
You can use if statements to screen input data for suitability. For
example, the following issues an error if the input data x is not
numeric:

if (!is(x,"numeric"))
stop("Data must be of mode numeric")

110
Organizing Computations

The stop function stops evaluation of the calling function at the point
where stop occurs. It takes a single argument that should evaluate to
a character string. If such an argument is supplied, the string is printed
to the screen as the text of an error message. For example, under
normal error handling, the above example yields the following output
if x is not numeric:

Error: Data must be of mode numeric


Dumped

We discuss the stop function more in the section Error Handling


(page 127).
Another common use of if statements is in missing-argument
handling within function definitions:

if(missing(y)) y <- sqrt(x)

This statement uses the missing function to check whether the


argument y is missing; if it is, the value sqrt(x) is assigned to y.
When constructing conditions within if statements, you may want to
test multiple conditions at once. You can use the two conditional
operators, && and ||, for logical AND and OR statements,
respectively. The syntax is:

if(condition1 && condition2) { expression }


if(condition1 || condition2) { expression }

These operators evaluate only as far as necessary to return a value.


For example, the && operator first evaluates condition1. If it is true,
then condition2 is evaluated and the result is returned as the value of
the condition statement. If condition1 is false, however, condition2
is not evaluated and S-PLUS returns FALSE for the entire statement.
Similarly, the || operator evaluates only until it encounters a true
statement and then returns TRUE. It returns FALSE only if every
condition is false.
Do not confuse the vectorized AND and OR operators (& and |) with
the conditional operators discussed here. The vectorized operators
return vectors of logical values, while conditionals return a single
logical value. For more details, see the section Comparison and
Logical Operators (page 86).

111
Chapter 4 Writing Functions in S-PLUS

Note

S-PLUS recognizes NA as a logical value, giving three possibilities for logical data: TRUE, FALSE, and
NA. If an if statement encounters NA, the calling function terminates and returns a message of the
following form:

Missing value where logical needed

Multiple Cases: One of the most common uses of the if statement is to provide
The if and switch branching for multiple cases. S-PLUS has no formal “case” statement,
Statements so you often implement cases using the following general form:
if(case1) { expression1 }
else if(case2) { expression2 }
else if(case3) { expression3 }
. . .
else lastexpression

The idea is to identify each case in your function and have it


correspond to exactly one if or else statement. Such a construction
makes it easy to follow the cases and serves as a check that all cases
are covered. As an example, the simple function below generates a
set of random numbers from one of three distributions:

my.ran <- function(n, distribution, shape)


{
#
# A function to generate n random numbers.
# If distribution="gamma", shape must be given.
#
if(distribution == "gamma") return(rgamma(n,shape))
else if(distribution == "exp") return(rexp(n))
else if(distribution == "norm") return(rnorm(n))
else stop("distribution must be \"gamma\", \"exp\", or
\"norm\"")
}

We must use the escape character \ in the stop message so that the
double quotes are recognized.

112
Organizing Computations

The switch function handles multiple cases in a slightly different way


than the if statement. The switch function accepts as its first
argument an S-PLUS expression that should evaluate to a character
string or numeric value. If the first argument is a character string, it is
compared to the remaining arguments and the value of the matched
argument is returned. If the first argument is an integer in the range
1:nargs - 1, the integer i corresponds to the i+1st argument in the
list. Integers tends to hide the nature of the individual cases, however;
character values require that you label each individual case.
For example, we can rewrite my.ran using switch as follows:

my.ran2 <- function(n, distribution, shape)


{
switch(distribution,
gamma = rgamma(n,shape),
exp = rexp(n),
norm = rnorm(n),
stop("distribution must be either \"gamma\",
\"exp\", or \"norm\""))
}

When the my.ran2 function is called, the interpreter evaluates the


distribution argument. If it is one of the three character strings
"gamma", "exp", or "norm", the corresponding expression is evaluated.
Otherwise, the stop expression is evaluated.

The ifelse The ifelse statement is a vectorized version of the if statement. The
Statement syntax is:

ifelse(condition, expression1, expression2)

The ifelse function evaluates both expression1 and expression2


and then returns the appropriate values from each based on the value
of condition. If an element of condition is true, ifelse returns the
corresponding value of expression1; otherwise, it returns the
corresponding value of expression2.
The condition in an if statement must evaluate to a single logical
value, either TRUE or FALSE. Thus, to carry out operations that involve
multiple comparisons, the if statement needs to take place inside a
loop. For example, here is one implementation of the built-in sign
function, which accepts a numeric object and returns ± 1 depending
on the sign of the elements:

113
Chapter 4 Writing Functions in S-PLUS

my.sign <- function(x)


{
for(i in 1:length(x)) {
if(x[i] > 0) { x[i] <- 1 }
else if(x[i] < 0) { x[i] <- -1 }
}
return(x)
}

The ifelse function provides a method for evaluating a condition


over an entire vector or array of values without resorting to a for
loop. Here is a rewritten version of the my.sign function that uses
ifelse twice:

my.sign2 <- function(x)


{
ifelse(x > 0, 1, ifelse(x < 0, -1, 0))
}

Not only is the version using ifelse much quicker, but it also handles
missing values:

> my.sign(c(1, 3, NA, -2))

Problem in my.sign: Missing value where logical needed:


if(x[i] > 0) { x[i] <- 1}
else if(x[i] < 0) { x[i] <- -1}

> my.sign2(c(1, 3, NA, -2))

[1] 1 1 NA -1

The ifelse function essentially uses subscripting, but includes some


extra steps so that it behaves correctly with NA values:

> ifelse

function(test, yes, no)


{
answer <- test
test <- as.logical(test)
n <- length(answer)
if(length(na <- which.na(test)))
test[na] <- F
answer[test] <- rep(yes, length.out = n)[test]

114
Organizing Computations

if(length(na))
test[na] <- T
answer[!test] <- rep(no, length.out = n)[!test]
answer
}

The idea is to perform a test on an object and replace those elements


for which the test is true with one value, while replacing the elements
for which the test is false with another value. The ifelse function sets
the subscripts corresponding to missing values to FALSE before
replacing the true elements, thus avoiding the error about missing
values that my.sign reports. It then resets those subscripts to TRUE
before replacing the false elements. The net result is that missing
values remain missing.

Warning

Note from the code above that ifelse subscripts using single numeric indices. Thus, it designed
to work primarily with vectors and, as an extension, matrices. If you subscript a data frame with
a single index, S-PLUS treats the data frame as a list and returns an entire column; for this reason,
you should exercise care when using ifelse with data frames. For details on subscripting, see the
section Operating on Subsets of Data (page 94).

If our original data have no missing values, we can improve the


my.sign2 function further, using the original for loop in my.sign as a
hint. The telltale construction x[i] <- indicates that we can try
subscripting directly:

my.sign3 <- function(x)


{
x[x > 0] <- 1
x[x < 0] <- -1
return(x)
}

For more hints on replacing for loops, see Chapter 21, Using Less
Time and Memory.

The break, next It is often either necessary or prudent to leave a loop before it reaches
and return its natural end. This is imperative in the case of a repeat statement,
Statements which has no natural end. In S-PLUS, you exit loops using one of three
statements: break, next, and return. Of these, return exits not only

115
Chapter 4 Writing Functions in S-PLUS

from the current loop, but also from the current function. The break
and next statements allow you to exit from loops in the following
ways:
• The break statement tells S-PLUS to exit from the current loop
and continue processing with the first expression following
the loop.
• The next statement tells S-PLUS to exit from the current
iteration of the loop and continue processing with the next
iteration.
For example, the function below simulates drawing a card from a
standard deck of 52 cards. If the card is not an ace, it is replaced and
another card is drawn. If the card is an ace, its suit is noted, it is
replaced, and another card is drawn. The process continues until all
four aces are drawn, at which time the function returns a statement of
how many draws it took to return all the aces.

draw.aces <- function()


{
ndraws <- 0
aces.drawn <- rep(F,4)
repeat {
draw <- sample(1:52, 1, replace=T)
ndraws <- ndraws + 1
if(draw %% 13 != 1)
next
aces.drawn[draw %/% 13 + 1] <- T
if(all(aces.drawn))
break
}
cat("It took", ndraws,
"draws to draw all four of the aces!\n")
}

The repeat The repeat statement is the simplest looping construction in S-PLUS.
Statement It performs no tests, but simply repeats a given expression
indefinitely. Because of this, the repeated expression should include a
way out, typically using either a break or return statement. The
syntax for repeat is:

116
Organizing Computations

repeat { expression }

For example, the function below uses Newton’s method to find the
positive, real jth roots of a number. A test for convergence is included
inside the loop and a break statement is used to exit from the loop.

newton <- function(n, j=2, x=1)


{
#
# Use Newton’s method to find the positive, real
# jth root of n starting at old.x == x.
# The default is to find the square root of n
# from old.x == 1.
#
old.x <- x
repeat
{
new.x <- old.x-((old.x^j-n) / (j * old.x^(j-1)))
# Compute relative error as a 2-norm.
conv <- sum((new.x - old.x)^2 / old.x^2)
if(conv < 1e-10)
break
old.x <- new.x
}
return(old.x)
}

The following command finds the square roots of the integers 4


through 9:

> newton(4:9)
[1] 2.000000 2.236068 2.449490 2.645751 2.828427 3.000000

To condense the code, we can replace the break statement inside the
loop with a return statement. This makes it clear what the returned
value is and avoids the need for any statements outside the loop:

newton2 <- function(n, j=2, x=1)


{
old.x <- x
repeat
{
new.x <- old.x-((old.x^j-n) / (j * old.x^(j-1)))
conv <- sum((new.x - old.x)^2 / old.x^2)

117
Chapter 4 Writing Functions in S-PLUS

if(conv < 1e-10)


return(old.x)
old.x <- new.x
}
}

Of course, such an abrupt departure from the function is undesirable


if additional calculations remain after the loop.

Note

The newton function is vectorized, as most S-PLUS functions should be. Thus, the convergence
criteria given above is not ideal for Newton’s method, since it does not check the convergence of
individual values. The code is provided here to illustrate the repeat and break statements; if you
wish to use the code in your work, you may want to experiment with different convergence
conditions.

The while You use the while statement to loop over an expression until a true
Statement condition becomes false. The syntax is simple:

while(condition) { expression }

For example, the function below returns a vector that corresponds to


the binary representation of an integer.

bitstring <- function(n)


{
tmp.string <- numeric(32)
i <- 0
while(n > 0) {
tmp.string[32-i] <- n %% 2
n <- n %/% 2
i <- i + 1
}
firstone <- match(1, tmp.string)
return(tmp.string[firstone:32])
}

In the bitstring code, n is made smaller in each iteration and


eventually becomes zero. We have no way of knowing beforehand
exactly how many times we need to execute the loop, so we use
while. Here is the result of calling bitstring with n=13:

118
Organizing Computations

> bitstring(13)
[1] 1 1 0 1

Note that the bitstring function is not vectorized. It accepts a single


integer value and does not work when the argument n is a numeric
vector.
Like the for statement, while is familiar to most programmers with
experience in other languages. And, like the for statement, it can
often be avoided in S-PLUS programming. You may need to use while
or for as a last resort in S-PLUS, but you should always try a
vectorized approach first.

The for Using for loops is a traditional programming technique that is fully
Statement supported in S-PLUS. Thus, you can translate most Fortran-like DO
loops directly into S-PLUS for loops and expect them to work.
However, as we have stated, using for loops in S-PLUS is usually not a
good technique because loops do not treat data objects as whole
objects. Instead, they attack the individual elements of data objects,
which is often a less efficient approach in S-PLUS. You should always
be suspicious of lines in S-PLUS functions that have the following
form:

x[i] <- expression

Code with this structure can usually be implemented more efficiently


with subscripting.
The syntax of S-PLUS for loops is:

for(name in expression1) { expression2 }

S-PLUS evaluates expression2 once for each name in expression1 ,


where expression1 evaluates to a vector. For example:

for(i in 1:10) print(i)

The index variable (i in the above example) has scope only within
the body of the for loop.
Note that there are certain situations in which for loops may be
necessary in S-PLUS:
• when the calculation on the i+1st element in a vector or array
depends on the result of the same calculation on the ith
element.

119
Chapter 4 Writing Functions in S-PLUS

• for some operations on lists. The lapply and sapply functions


perform some looping implicitly and may be more efficient
than loops you code yourself.

Notes Comments within S-PLUS functions are sometimes roughly handled


Regarding by the interpreter. This is because S-PLUS attaches comments to the
beginning of the expressions that follow them. If no expression
Commented follows a comment, it is not attached and will not be printed when
Code you view the function. For example, suppose we define a function
primes as follows:

primes <- function(n = 100)


{
n <- as(abs(n), "integer")
if(n < 2) return(integer(0))
p <- 2:n
smallp <- integer(0)
# the sieve
repeat {
i <- p[1]
smallp <- c(smallp, i)
p <- p[p %% i != 0]
if(i > sqrt(n)) break
}
return(c(smallp, p))
}

If we type primes at the S-PLUS prompt, all of the function code is


printed, including the comment. However, suppose we add another
comment after the last return statement when we define the function:

return(c(smallp, p)) # return the prime values

No expression follows this comment, so it is not printed when we


view the code of primes at the S-PLUS prompt.

120
Specifying Argument Lists

SPECIFYING ARGUMENT LISTS


A well-chosen argument list can add considerable flexibility to most
functions. Some languages, notably C, make a distinction between a
function’s parameter list and a function call’s argument list. S-PLUS
maintains this distinction by speaking of an argument’s formal name,
which corresponds to the name specified in a parameter list, and its
actual name, which is used when actually calling the function. In this
section, we present many examples of argument lists in S-PLUS
functions and give suggestions for constructing your own.

Formal and When you define an S-PLUS function, you specify the arguments the
Actual Names function accepts by means of formal names. Formal names can be any
combination of letters, numbers, and periods, as long as they are
syntactically valid and do not begin with a number. The formal name
... (three dots) is used to pass arbitrary arguments to a function; we
discuss this in the section Variable Numbers of Arguments (page 124).
For example, consider the argument list of the hist function:

> args(hist)

function(x, nclass = "Sturges", breaks, plot = TRUE,


probability = FALSE, include.lowest = T, ...,
xlab = deparse(substitute(x)))

The formal names for this argument list are x, nclass, breaks, plot,
probability, include.lowest, ..., and xlab.

When you call a function, you specify actual names for each argument.
Unlike formal names, an actual name can be any valid S-PLUS
expression that makes sense to the function. You can thus provide a
function call such as length(x) as an argument. For example, suppose
we want to create a histogram of the Mileage column in the
fuel.frame data set:

> hist(fuel.frame$Mileage)

The expression fuel.frame$Mileage is the actual name that


corresponds to the formal argument x.

121
Chapter 4 Writing Functions in S-PLUS

Specifying In general, there are two ways to specify default values for arguments
Default in an S-PLUS function:
Arguments • The simplest way is to use the structure formalname=value
when defining a formal argument. For example, consider
again the argument list for the hist function.

> args(hist)
function(x, nclass = "Sturges", breaks, plot = TRUE,
probability = FALSE, include.lowest = T, ...,
xlab = deparse(substitute(x)))

Default values are supplied for the nclass, plot, probability,


include.lowest, and xlab arguments.

• You can also specify defaults by providing code in the body of


a function that handles missing arguments. This technique is
useful if the code for computing a default value is too
complicated to include in the formal argument list. We discuss
this more in the next section.

Handling To test whether a given argument is supplied in the current function


Missing call, use the construction if(missing(formalname)). For example, the
following code sample from hist shows how it handles a missing
Arguments breaks argument:

if(missing(breaks)) {
if(is.character(nclass))
nclass <- switch(casefold(nclass),
sturges = nclass.sturges(x),
fd = nclass.fd(x),
scott = nclass.scott(x),
stop("Nclass method not recognized"))
else if(is.function(nclass)) nclass <- nclass(x)
breaks <- pretty(x, nclass)
if(length(breaks) == 1) {
if(abs(breaks) < .Machine$single.xmin * 100)
breaks <- c(-1, -0.5, 0.5, 1)
else if(breaks < 0)
breaks <- breaks * c(1.3, 1.1, 0.9, 0.7)
else
breaks <- breaks * c(0.7, 0.9, 1.1, 1.3)
}

122
Specifying Argument Lists

if((!include.lowest && any(


x <= breaks[1])) || any(x < breaks[1]))
breaks <- c(breaks[1] - diff(breaks)[1], breaks)
x[x > max(breaks)] <- max(breaks)
}

The construction if(missing(formalname)) is useful for specifying a


default value if the code for computing the default is too complicated
to include in the formal argument list. Otherwise, the construction
formalname=value is usually simpler.

Lazy Many programmers with experience in other programming


Evaluation languages make too much use of missing-argument handling in
S-PLUS. This is because S-PLUS uses lazy evaluation, which means that
arguments are evaluated only as needed.
For example, consider the following simple plotting function:

plotsqrt <- function(x,y)


{
z1 <- seq(1,x)
if(missing(y)) plot(z1, sqrt(z1))
else plot(z1,y)
}

In this function, the missing-argument construction supplies the


default value sqrt(z1) for the argument y. The default depends on
the value z1, which is unknown until the completion of the first line in
the body of the function. Because of this, many programmers avoid
defining the default in the formal argument list. However, lazy
evaluation allows us to do this in S-PLUS without receiving an error.
Thus, we can rewrite plotsqrt as follows:

plotsqrt2 <- function(x, y=sqrt(z1))


{
z1 <- seq(1,x)
plot(z1,y)
}

S-PLUS doesn’t need the value for y until the final expression, at
which time it can be successfully evaluated. In many programming
languages, such a function definition causes errors similar to
Undefined variable sqrt(z1). In S-PLUS, however, arguments
aren’t evaluated until the function body requires them.

123
Chapter 4 Writing Functions in S-PLUS

Variable When you write functions for custom graphics or statistical


Numbers of procedures, you often build on existing functions that have large
numbers of arguments. Frequently, you need only a few new
Arguments arguments for your particular purpose. You can define only the
arguments you need, but this reduces flexibility by limiting your
access to the underlying function. You can specify defaults in the new
function that cover every argument of the underlying function, but
this is a burden during programming. Instead, you can use the special
formal name ... (three dots) to specify an arbitrary number of
arguments.
In the section Specifying Default Arguments (page 122), we saw one
example of the ... argument in the hist function. The hist function
is a special-purpose variant of the general function barplot, which
accepts a large number of arguments. Rather than duplicate all of the
barplot arguments, hist uses ... to pass any the user specifies
directly to barplot.
Within the body of a function, the only valid use of ... is as an
argument inside a function call. In the following code fragment from
hist, the ... argument passes all unmatched arguments from hist
directly to barplot:

if(plot)
invisible(barplot(counts, width = breaks,
histo = T, ..., xlab = xlab))

The counts, breaks, and xlab objects are generated in the hist code
and passed to the formal arguments in barplot. In addition, anything
the user specifies that is not an element of the hist argument list is
given to barplot through the ... argument.
In general, arbitrary arguments can be passed to any function. You
can, for example, create a function that computes the mean of an
arbitrary number of data sets using the mean and c functions as
follows:

my.mean <- function(...) { mean(c(...)) }

As a variation, you can use the list function to loop over arguments
and compute the individual means of an arbitrary number of data
sets:

all.means <- function(...)


{

124
Specifying Argument Lists

dsets <- list(...)


n <- length(dsets)
means <- numeric(n)
for(i in 1:n) means[i] <- mean(dsets[[i]])
return(means)
}

Note that formal arguments can follow ... in function definitions.


This construction is useful for functions such as my.mean and
all.means, which compute a return value from an arbitrary number
of data sets. To distinguish them from the data used to compute return
values, arguments that follow ... must be supplied by name when
included in a function call and they cannot be abbreviated. For
example, suppose we want to include the trim argument to mean in
the my.mean function. We can do this with the following function
definition:

my.mean <- function(..., trim=0.0)


{
mean(c(...), trim=trim)
}

When calling my.mean, we can use the trim argument only by


explicitly naming it:

> my.mean(corn.rain, corn.yield, trim=0.5)


[1] 17.95

When an argument list includes ..., actual arguments that cannot be


matched to a formal argument are simply ignored. If the argument list
does not include ..., unmatched arguments generate an error of the
form:

Error in call to function: argument name not matched

Required and Required arguments are those for which a function definition provides
Optional neither a default value nor missing-argument instructions. All other
arguments are optional. For example, consider again the argument list
Arguments for hist:

> args(hist)

function(x, nclass = "Sturges", breaks, plot = TRUE,


probability = FALSE, include.lowest = T, ...,

125
Chapter 4 Writing Functions in S-PLUS

xlab = deparse(substitute(x)))

Here, x is a required argument. The breaks argument is optional


because code is included in the body of hist to handle the case when
breaks is missing. The nclass, plot, probability, include.lowest,
and xlab arguments are optional with defaults defined in the
argument list. The ... argument allows you to pass other arguments
directly to the barplot function. For information on defining defaults,
see the section Specifying Default Arguments (page 122).
To see a function’s required and optional arguments without viewing
S-PLUS code, see the on-line help. The hist help file, for example,
lists x as the only required argument; the remaining arguments are all
listed as optional.

126
Error Handling

ERROR HANDLING
An often neglected aspect of function writing is error-handling, in
which you specify what to do if something goes wrong. When writing
quick functions for your own use, it doesn’t make sense to invest
much time in “bullet-proofing” your functions: that is, in testing the
data for suitability at each stage of the calculation and providing
informative error messages and graceful exits from the function if the
data proves unsuitable. However, good error handling becomes
crucial when you broaden the intended audience of your function.
In the section Flow of Control (page 108), we saw one mechanism in
stop for implementing graceful exits from functions. The stop
function immediately stops evaluation of the current function, issues
an error message, and then dumps debugging information to a data
object named last.dump. The last.dump object is a list that can either
be printed directly or reformatted using the traceback function. For
example, here is the error message and debugging information
returned by the my.ran function from page 112:

# Call my.ran with an unrecognized distribution.


> my.ran(10, distribution="unif")

Problem in my.ran(10, distribution = "unif"): distribution


must be "gamma", "exp", or "norm"
Use traceback() to see the call stack

> traceback()

6: eval(action, sys.parent())
5: doErrorAction("Problem in my.ran(10, distribution =
\"unif\"): distribution must be \"gamma\", \"exp\", or
\"norm\"",
4: stop("distribution must be \"gamma\", \"exp\", or
\"norm\"")
3: my.ran(10, distribution = "unif")
2: eval(expression(my.ran(10, distribution = "unif")))
1:
Message: Problem in my.ran(10, distribution = "unif"):
distribution must be "gamma", "exp", or "norm"

127
Chapter 4 Writing Functions in S-PLUS

The amount of information stored in last.dump is controlled by the


error argument to the options function. The default value is
dump.calls:

> options()$error
expression(dump.calls())

The dump.calls function stores a list of function calls, starting with


the top-level call and including all calls within the function up to and
including the one that produced the error. Another option, the
dump.frames function, provides more information because it includes
the complete set of frames created during the evaluation. However,
dump.frames can generate a very large last.dump object; it should
therefore be used only for debugging purposes and not for general
error-handling. Other possibilities for the error argument to options
are discussed in Chapter 6, Debugging Your Functions.
It is good programming practice to place stop statements within
functions to mark the limits of the function’s capability. For example,
we can rewrite our newton2 function from page 116 so that it stops
evaluation if there are no real roots to compute:

newton3 <- function(n, j=2, x=1)


{
if(n < 0 && j %% 2 == 0)
stop("No real roots")
old.x <- x
repeat
{
new.x <- old.x-((old.x^j-n) / (j * old.x^(j-1)))
conv <- sum((new.x - old.x)^2 / old.x^2)
if(conv < 1e-10)
return(old.x)
old.x <- new.x
}
}

The warning function is similar to stop, but does not cause S-PLUS to
stop evaluation. Instead, S-PLUS continues evaluating after the
warning message is printed to the screen. This is a useful technique
for warning users about potentially hazardous conditions such as data
coercion:

128
Error Handling

if (!is(x, "numeric")) {
warning("Coercing to mode numeric")
x <- as(x, "numeric")
}

As with most matters of programming style, the degree to which you


incorporate stops and warnings depends on the level of finish you
intend for your functions. Functions for distribution to others should
be held to a higher standard than functions for your own use.

129
Chapter 4 Writing Functions in S-PLUS

INPUT AND OUTPUT

Data Input Most data input to S-PLUS functions is in the form of named objects
passed as required arguments to the functions. For example:

> mean(corn.rain)
[1] 10.78421

Data can also be generated “on-the-fly” by passing S-PLUS


expressions as arguments, such as calls to the c function:

> mean(c(5,9,23,42))
[1] 19.75

However, if you build turnkey systems or other applications in which


you want to hide as much of the S-PLUS machinery as possible, your
needs may go beyond this. Instead, you might want to build functions
that read data from an existing file, create an S-PLUS object from the
data, perform some analysis, and then return a value. Such functions
conceal much of the structure of S-PLUS objects from users who may
not know (or care to know) such details.
The principal tools for reading data from files are scan, read.table,
and importData. The scan function reads ordinary sequential text
files, the read.table function imports tabular text data into S-PLUS
data frames, and importData reads data from a number of different
file formats. Chapter 5, Importing and Exporting, discusses the three
functions in detail.

Data Output S-PLUS is an interactive system, so virtually anything you type


prompts a response from S-PLUS. In general, this response is the value
of the evaluated expression, which S-PLUS prints automatically. If the
value is assigned, however, automatic printing is not performed:

> 7 + 3
[1] 10

130
Input and Output

> a <- 7 + 3

Other responses from S-PLUS range from error messages to


interactive prompts within a function call. We discuss error messages
in the section Error Handling (page 127). The following subsections
discuss four direct forms of creating output: return values, side effects,
permanent data files, and temporary files.

Formatting The format of printed return values in S-PLUS is determined partially


Output by the mode of a returned object and partially by various session
options. The examples below discuss different session options you
can use to format output from your functions.

The width and length options


The width argument to the options function specifies the number of
characters that fit on a line of output. By default, width=80:

> options()$width
[1] 80

The length argument to the options function specifies the number of


lines that fit on a page of output. The length option also indicates
where S-PLUS places dimnames attributes when a large matrix is
printed; S-PLUS prints dimnames once on each page of output, and the
length option governs how much information fits on a page. By
default, length=48:

> options()$length
[1] 48

The digits option


The digits argument to the options function specifies the number of
significant digits to print. By default, digits=7. To see full double
precision output, set digits=17 as follows:

> options(digits=17)
> pi
[1] 3.1415926535897931

Most print methods include a digits argument that can be used to


override the value of options()$digits. Thus, you can call print
explicitly with the desired number of significant digits. For example:

131
Chapter 4 Writing Functions in S-PLUS

# Reset the digits option to its default.


> options(digits=7)

> print(pi, digits=17)


[1] 3.1415926535897931

You can also change the digits value through the General Settings
dialog; select Options  General Settings and click on the
Computations tab to see this. It is important to note that any option
changed through the GUI persists from session to session. In contrast,
options changed via the options function are restored to their default
values when you restart S-PLUS. For more details, see the help files for
the options function and the Command Line Options dialog.

The format, round, and signif functions


To print numeric data as a formatted character string, use the format
function. This function returns a character vector the same length as
the input in which all elements have the same length. The length of
each element in the output is usually determined by the digits
option. For example, the following command uses the default digits
value of 7 to format and print the vector sqrt(1:10):

> format(sqrt(1:10))

[1] "1.000000" "1.414214" "1.732051" "2.000000" "2.236068"


[6] "2.449490" "2.645751" "2.828427" "3.000000" "3.162278"

Alternatively, we can set digits=3 as follows:

> options(digits=3)
> format(sqrt(1:10))

[1] "1.00" "1.41" "1.73" "2.00" "2.24" "2.45" "2.65"


[8] "2.83" "3.00" "3.16"

The format function also includes a digits argument that can be


used to override the value of options()$digits. The format function
interprets digits as the number of significant digits retained, but it
replaces trailing zeros with blanks:

# Reset the digits option to its default.


> options(digits=7)

132
Input and Output

> format(sqrt(1:10), digits=3)

[1] "1 " "1.41" "1.73" "2 " "2.24" "2.45" "2.65"
[8] "2.83" "3 " "3.16"

To include trailing zeros, you can use the nsmall argument to format,
which sets the minimum number of digits to include after the decimal
point:

> format(sqrt(1:10), digits=3, nsmall=2)

[1] "1.00" "1.40" "1.73" "2.00" "2.24" "2.45" "2.64"


[8] "2.83" "3.00" "3.16"

The nsmall argument is discussed in the help file for format.default.


You can use the round and signif functions to further control the
action of the digits argument to format. The round function uses
digits to specify the number of decimal places, while signif uses it
to specify the number of significant digits retained. For example, note
the difference in the output from the following two commands:

> format(round(sqrt(1:10), digits=5))

[1] "1.00000" "1.41421" "1.73205" "2.00000" "2.23607"


[6] "2.44949" "2.64575" "2.82843" "3.00000" "3.16228"

> format(signif(sqrt(1:10), digits=5))

[1] "1.0000" "1.4142" "1.7321" "2.0000" "2.2361" "2.4495"


[7] "2.6458" "2.8284" "3.0000" "3.1623"

133
Chapter 4 Writing Functions in S-PLUS

Warning

If you want to print numeric values to a certain number of digits, do not use print followed by
round. Instead, use format to convert the values to character vectors and then specify a certain
number of entries. Printing numbers with print involves rounding, and rounding an
already-rounded number can lead to anomalies. To see this, compare the output from the
following two commands, for x <- runif(10):

> round(print(x), digits = 5)


> as.numeric(format(x, digits = 5))
Note that the second command prints the correct number of digits but the first one doesn’t.
This warning applies to all functions that use print, such as var and cor, and not just to the print
function itself.

Constructing When the body of a function is an expression enclosed in braces, the


Return Values value of the function is the value of the last expression inside the
braces. This fits well with the usual top-down design paradigm, where
the goal is to start with some input, proceed through a set of
operations, and return the finished output. For most simple functions,
you need to verify only that the final value is what you actually want
returned. Thus, if the body of a function carries out a series of
replacements, the final line might be the name of the object in which
the replacements were done. For example, the following function
returns a modified version of the input object x:

bigger <- function(x,y)


{
y.is.bigger <- y > x
x[y.is.bigger] <- y[y.is.bigger]
x
}

Even in simple functions like this, however, we recommend that you


explicitly use a return statement to clearly identify the returned
value:

bigger <- function(x,y)


{
y.is.bigger <- y > x
x[y.is.bigger] <- y[y.is.bigger]

134
Input and Output

return(x)
}

Often, you need to return a set of values that are generated


throughout a function. To do this, assign the intermediate calculations
to temporary objects within the function and then gather the objects
into a return list. For example, suppose you have a data file
containing daily sales for each of ten department stores over a span of
one month. Each month, you want to compute a summary of that
month’s sales using the daily sales information as the input data. Here
is a function named monthly.summary that reads in such a data file,
creates a matrix of the input data, and then performs the desired
analysis:

monthly.summary <- function(datafile)


{
x <- matrix(scan(datafile), nrow=10, byrow=T)
store.totals <- rowSums(x)
mean.sales <- mean(store.totals)
attr(mean.sales, "dev") <- stdev(store.totals)
best.performer <- match(max(store.totals), store.totals)
return(list("Total Sales" = store.totals,
"Average Sales" = mean.sales,
"Best Store" = best.performer))
}

Notice that the function has no side effects. All calculations are
assigned to objects in the function’s frame, which are then combined
into a list and returned as the value of the function. This is the
preferred method for returning a number of different results in an
S-PLUS function.
Suppose we have data files named april.sales and may.sales
containing daily sales information for April and May, respectively.
The following commands show how monthly.summary can be used to
compare the data:

> Apr92 <- monthly.summary("april.sales")


> May92 <- monthly.summary("may.sales")
> Apr92

$"Total Sales":
[1] 55 59 91 87 101 183 116 119 78 166

135
Chapter 4 Writing Functions in S-PLUS

$"Average Sales":
[1] 105.5
attr($"Average Sales", "dev"):
[1] 42.16436

$"Best Store":
[1] 6

> May92

$"Total Sales":
[1] 65 49 71 91 105 163 126 129 81 116

$"Average Sales":
[1] 99.6
attr($"Average Sales", "dev"):
[1] 34.76013

$"Best Store":
[1] 6

As we discuss in the section Assignments (page 89), creating


permanent objects from within functions is a dangerous practice
because it can overwrite existing objects in your working directory.
Thus, if our monthly.summary function creates permanent objects
named store.totals, mean.sales, and best.performer instead of
returning them as a list, we would lose the objects every time we ran
the function. Instead, we recommend the list paradigm discussed
above for returning a number of different results in an S-PLUS
function.

Side Effects A side effect of a function is any result that is not part of the returned
value. Examples include graphics plots, printed values, permanent
data objects, and modified session options or graphical parameters.
Not all side effects are bad; graphics functions are written to produce
side effects in the form of plots, while their return values are usually of
no interest. In such cases, you can suppress automatic printing with
the invisible function, which invisibly returns the value of a
function. Most of the printing functions, such as print.atomic, do
exactly this:

136
Input and Output

> print.atomic

function(x, quote = T, ...)


{
if(length(x) == 0.)
cat(mode(x), "(0)\n", sep = "")
else .Call("s_pratom", x, TRUE, quote)
invisible(x)
}

You should consciously try to avoid hidden side effects because they
can wreak havoc with your data. Permanent assignment from within
functions is the cause of most bad side effects. Many S-PLUS
programmers are tempted to use permanent assignment because it
allows expressions inside functions to work exactly as they do at the
S-PLUS prompt. The difference is that if you type

myobj <<- expression

at the S-PLUS prompt, you are likely to be aware that myobj is about to
be overwritten if it exists. In contrast, if you call a function that
contains the same expression, you may have no idea that myobj is
about to be destroyed.

Writing to Files In general, writing data to files from within functions can be as
dangerous a practice as permanent assignment. Instead, it is safer to
create special functions that generate output files. Such functions
should include arguments for specifying the output file name and the
format of the included data. The actual writing can be done by a
number of S-PLUS functions, the simplest of which are write,
write.table, cat, sink, and exportData. The write and write.table
functions are useful for retaining the structure of matrices and data
frames, while cat and sink can be used to create free-format data
files. The exportData function creates files in a wide variety of
formats. See Chapter 5, Importing and Exporting, for details.
Functions such as write, cat, and exportData all generate files
containing data; no S-PLUS structure is written to the files. If you wish
to write the actual structure of your S-PLUS data objects to text files,
use the dump, data.dump, or dput functions. We discuss each of these
below.

137
Chapter 4 Writing Functions in S-PLUS

The write and write.table functions


The write function writes S-PLUS vectors and matrices to specified
files. It writes matrices column by column and includes five values in
each line of the output file. For example, consider the following
matrix, which we write to the output file mat1.txt:

> mat <- matrix(1:12, ncol=4)


> mat

[,1] [,2] [,3] [,4]


[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12

> write(mat, file="mat1.txt")

S-PLUS stores mat1.txt in your working directory. You can view it in


the text editor or pager of your choice. It contains the following three
lines:

1 2 3 4 5
6 7 8 9 10
11 12

If you want to write a matrix to an output file in the same form as it


appears in S-PLUS, transform the matrix first with the t function and
specify the number of columns with the ncolumns argument. For
example:

> write(t(mat), file="mat2.txt", ncolumns=4)

The mat2.txt file looks similar to the object mat, and contains the
following lines:

1 4 7 10
2 5 8 11
3 6 9 12

Alternatively, you can use the write.table function to write a vector,


matrix, or data frame to a specified file. With write.table, you do
not need to transpose the data object, and you can include row and
column names in the output. For example, the following command
creates a tab-delimited text file fuel.txt that contains the fuel.frame
data set:

138
Input and Output

> write.table(fuel.frame, file = "fuel.txt", sep = "\t")

The cat and sink functions


The cat function is a general-purpose writing tool that can be used to
write to the screen as well to files. The cat function is helpful for
creating free-format data files, particularly when it is used with the
format function. For example:

# Set the seed for reproducibility.


> set.seed(21)
> x <- runif(10)
> cat(format(x), fill=T)

0.8854639 0.3739834 0.4220316 0.2441107 0.6033186


0.5787458 0.3944685 0.5834372 0.1457345 0.4555785

The argument fill=T limits the width of each line in the output file to
the width value specified in the options list. For more details on the
format function and the width option, see the section Formatting
Output (page 131).
To write to a file with cat, simply specify a file name with the file
argument:
> cat(format(x), file="mydata1.txt")

S-PLUS stores mydata1.txt in your working directory. It overwrites


any existing file named mydata1.txt unless you set the argument
append=TRUE in the call to cat.

The sink function directs S-PLUS output into a file rather than to the
screen. It can be used as an alternative to multiple
cat(..., append=T) statements. For example, the following
commands open a sink to a file named mydata2.txt, write x to the file
in three different ways, and then close the sink so that S-PLUS writes
future output to the screen:

> sink(file = "mydata2.txt")


> x
> format(x)
> format(x, digits=3)
> sink()

For more examples using sink, see the section Standard Connections
(page 145).

139
Chapter 4 Writing Functions in S-PLUS

The dump, data.dump, and dput functions


Files written by cat and write do not contain information regarding
the structure of S-PLUS objects. To read the files back into S-PLUS
objects, you must reconstruct this information. To write ASCII
versions of S-PLUS objects that contain complete structural
information, use the dump, data.dump, and dput functions.
The dump function is primarily a programmer’s tool. It allows you to
create editable, sourceable versions of S-PLUS functions. You can use
dump, for example, to distribute a collection of functions via electronic
mail. Alternatively, you can use dump to create a text file of a function,
edit it outside of S-PLUS, and then send the modified version to
another user. To read your dumped functions back into S-PLUS, use
the source function.
The data.dump function writes S-PLUS data objects to files. It uses a
text-based format so that the objects can be restored on any machine.
Because of the special text format, you should not edit the files
generated by data.dump. Instead, the primary uses of data.dump
include transferring objects between versions of S-PLUS, between
machines, or between users. To read your dumped objects back into
S-PLUS, use the data.restore function.

Note

In earlier versions of S-PLUS, the dump function could be used to transfer data objects such as
matrices and lists between machines. This behavior is no longer supported in SV4 versions of
S-PLUS. Currently, dump is used only for creating editable text files of S-PLUS functions; use
data.dump to transfer your data objects between machines. For more details, see the help files for
these two functions.

The dput function can be thought of as a companion to assign.


Where assign creates S-PLUS objects in binary form, dput creates
them in ASCII text. The output from dput can be read back into
S-PLUS with the dget function.
Like data.dump, you can use the dput function to transfer objects
between machines. However, the formats used by the two functions
are slightly different. To see this, note the differences in the two files
generated by the following commands:

140
Input and Output

# Set the seed for reproducibility.


> set.seed(49)
> tmp.df <- data.frame(x=1:10, y=runif(10))
> dput(tmp.df, file="mydata1.txt")
> data.dump("tmp.df", file="mydata2.txt")

The files mydata1.txt and mydata2.txt are stored in your working


directory.
Files created by data.dump include the names of the dumped objects.
Thus, you can read mydata2.txt into S-PLUS with data.restore and
the object tmp.df becomes available in your working directory. In
contrast, files created by dput include the contents of objects, but not
the object names. The following commands illustrate this:

# Remove tmp.df and restore the contents


# of the file created by data.dump.
> rm(tmp.df)
> data.restore("mydata2.txt")
[1] "mydata2.txt"

> tmp.df

x y
1 1 0.54033146
2 2 0.27868110
3 3 0.31963785
4 4 0.26984466
5 5 0.75784146
6 6 0.32501004
7 7 0.90018579
8 8 0.04155586
9 9 0.28102661
10 10 0.09519871

# Remove tmp.df and restore the contents


# of the file created by dput.
> rm(tmp.df)
> dget("mydata1.txt")

x y
1 1 0.54033146
2 2 0.27868110
3 3 0.31963785

141
Chapter 4 Writing Functions in S-PLUS

4 4 0.26984466
5 5 0.75784146
6 6 0.32501004
7 7 0.90018579
8 8 0.04155586
9 9 0.28102661
10 10 0.09519871

> tmp.df
Problem: Object "tmp.df" not found

You must assign the output from dget to access its contents in your
working directory:

> tmp.df <- dget("mydata1.txt")

Creating You can use cat, write, and dput together with the tempfile function
Temporary Files to create temporary files that have unique names. Such files are
convenient to use for a variety of purposes, including text processing
tools. For example, the built-in ed function creates a temporary file
that holds the object being edited:

> ed

function(data, file = tempfile("ed."),


editor = "notepad", error.expr)
{
drop <- missing(file)
if(missing(data)) {
if(!exists(".Last.file"))
stop("Nothing available for re-editing")
file <- .Last.file
data <- .Last.ed
}
else if(mode(data) == "character" &&
length(attributes(data)) == 0)
cat(data, file = file, sep = "\n")
else if(is.atomic(data) &&
length(attributes(data)) == 0)
cat(data, file = file, fill = T)
else dput(data, file = file)
. . .

142
Input and Output

The tempfile function creates a unique name for a temporary file. In


the ed function above, the unique name is composed of the character
string ed. and a unique ID number. Note that tempfile generates
only a name for a temporary file and not the file itself. You must use
cat, write, or dput to actually create and write to the file.

The temporary files created with tempfile are ordinary files written
to the directory specified by the S_TMP environment variable.
Customarily, this directory is a temporary storage location that is
wiped clean frequently. To prevent overloading this directory, it is
best if you incorporate file cleanup into your functions that utilize
tempfile. This is discussed in the section Wrap-Up Actions (page
158). For more information on S-PLUS environment variables such as
S_TMP, see Chapter 18, The S-PLUS Command Line and the System
Interface.

Connections Connections are mechanisms for connecting S-PLUS to other processes


in the computing environment. With connections, S-PLUS can
efficiently and easily read or write streams of data. The most common
example of a connection is a physical file; other examples include
external processes that read or write data, and S-PLUS character
vectors.
In general, connections provide several facilities for the S-PLUS
programmer:
1. They provide a uniform mechanism for functions that need to
read or write data.
2. They allow mixed reading and writing during a single S-PLUS
session. Because S-PLUS manages all active connections,
operations that are difficult or error-prone at a lower level are
tractable with connections.
3. They hide many of the low-level programming details needed
for doing input and output.
If you can express input and output computations in terms of
connections, the result is usually convenient, reliable, and efficient
code. In this section, we give a brief overview of S-PLUS connections;
for further details and additional topics not discussed here, see
Chapter 10 in Chambers (1998).

143
Chapter 4 Writing Functions in S-PLUS

Connection Table 4.7 lists the connection classes available in S-PLUS. Each of
Classes these classes extend the virtual class "connection".
Table 4.7: Classes of S-PLUS connections.

Connection
Description
Class

file File connection. This is represented by a character


vector naming the path of the file. If no path is
supplied when the connection is opened, a temporary
one is created.

pipe System command, with standard input that S-PLUS can


write to and standard output that S-PLUS can read
from. This is represented by a character vector naming
the command. S-PLUS opens the connection by
excuting the command; data written to the pipe then
becomes its standard input.

fifo First-in first-out connection. This is represented like a


file, by a character vector naming a path. Unlike a
file, a fifo holds on to data only until it is read, at
which point the data effectively disappears. For this
reason, a fifo is sometime referred to as a named pipe.

textConnection Text connection. This is represented by an S-PLUS


character vector. This class is provided mainly as a
convenience, so that you can use objects containing
character vectors in computations that expect to read
from connections. By design, text connections are
read-only in S-PLUS.

All four classes listed in the table are functions that can be used to
(optionally) open the described connections and return S-PLUS
connection objects. Connection objects are one of the primary tools for
managing connections in S-PLUS. For example, the following
command opens a file connection to myfile.dat and assigns the value
to the connection object filecon.

144
Input and Output

> filecon <- file("myfile.dat")

The side effect of the call to file opens the connection, so you may
be tempted to think that the returned object is of little interest.
However, consciencious use of connection objects results in cleaner
and more flexible code. For example, you can use these objects to
delay opening particular connections. Each connection class has an
optional argument open that can be used to suppress opening a
connection. With the returned connection object, you can use the
open function to explicitly open the connection when you need it:

# Create a file connection object but do not open it.


> textfile <- file("myfile.dat", open=F)

# After some time, open the connection.


> filecon <- open(textfile)

# After reading from or writing to the connection, close it.


> close(filecon)

Most S-PLUS connection functions abide by the following simple


rules:
• If a connection is not currently open, open it and ensure the
connection is closed at the end of the function call.
• If a connection is already open, leave it open at the end of the
function call.
Thus, if you use one of the functions listed in Table 4.7 to open a
connection, you do not need to explicitly close it. However, if you use
the open function, you should ensure that the connection is properly
closed by using the close function. Organizing your computations in
this way prevents forgotten connections from consuming machine
resources. See the section Support Functions for Connections (page
149) for additional tips on managing connections.

Standard Table 4.8 lists a number of predefined connections available in


Connections S-PLUS. The stdin, stdout, and stderr functions organize user
interactions into three traditional streams of data: input from the user,
printed output, and errors or messages for the user. The function calls
stdin(), stdout(), and stderr() return the current connections
associated with standard input, standard output, and standard error,

145
Chapter 4 Writing Functions in S-PLUS

respectively. When S-PLUS is running as an interactive session,


standard input is your keyboard and both standard output and
standard error are your display.
Table 4.8: S-PLUS functions associated with standard connections.

Standard
Description
Connection

stdin Standard input.

stdout Standard output.

stderr Standard error. S-PLUS writes messages to this


connection.

auditConnection Connection to which S-PLUS writes auditing


information about the session. See Chapter 18,
The S-PLUS Command Line and the System
Interface, for information on the session audit
file.

clipboardConnection System clipboard for the S-PLUS session.

sink Function that redirects the output associated


with standard connections

You can redirect the output connection to another connection (usually


a file) with the sink function. The sink command exists primarily for
its side effect: sinked output remains redirected until another call to
sink explicitly alters it. For example, the following commands
redirect the standard output connection to the file x.out.

# Open the sink.


> sink("x.out")

# Generate 50 random numbers and print out


# their mean and variance.
> x <- runif(50)
> mean(x)
> var(x)

146
Input and Output

# Close the sink.


> sink()

Connection By default, file, fifo, and pipe connections are opened for both
Modes reading and writing, appending data to the end of the connection if it
already exists. While this behavior is suitable for most applications,
you may require different modes for certain connections. Example
situations include:
• Opening a file connection as read-only so that it is not
accidentally overwritten.
• Opening a file connection so that any existing data on it is
overwritten, rather than appended to the end of it.
You can change the default mode of most connections through the
mode argument of the open function. For example, the following
commands open a file connection as write-only. If we try to read from
the connection, S-PLUS returns an error:

# Create a file connection object but do not open it.


> textfile <- file("myfile.dat", open=F)

# Open the connection as write-only.


> filecon <- open(textfile, mode = "w")

> scan(filecon)
Problem in scanDefault(file, what, n): "myfile.dat" already
opened for "write only": use reopen() to change it

# Close the connection.


> close(filecon)
[1] T

As the error message suggests, you can use the reopen function to
close the connection and reopen it with a different value for mode.

Note

The mode of a textConnection cannot be changed. By design, text connections are read-only.

147
Chapter 4 Writing Functions in S-PLUS

Instead of explicitly calling open, you can supply the desired mode
string to the open argument of one of the connection classes. Thus, the
following command illustrates a different way of opening a file as
write-only:

> filecon <- file("myfile.dat", open = "w")

Table 4.9 lists the most common mode strings used to open
connections in S-PLUS.
Table 4.9: Common modes for S-PLUS connections.

Mode String Description

"rw" Open for reading and writing, overwriting current


data on the connection if it already exists.

"ra" Open for reading and writing, appending data to


the current version of the connection if it already
exists. Writing is allowed only at the end of the
connection.

"r" Open for reading only.

"w" Open for writing only, overwriting current data on


the connection if it already exists.

"a" Open for writing only, appending data to the


current version of the connection if it already
exists. Writing is allowed only at the end of the
connection.

"*" Open for writing only, appending data to the


current version of the connection if it already
exists. Writing is allowed anywhere in the
connection; the initial write position is at the end.

"" Do not open the connection. Unopened


connection objects can be opened explicitly at a
later time using the open command.

148
Input and Output

Support The functions listed in the two tables below provide support for
Functions for managing connections in your S-PLUS session: Table 4.10 describes
Connections functions that allow you to see any active connections and Table 4.11
describes functions that prepare connections for reading or writing.
We have already seen the open and close functions in previous
sections. In the text below, we describe each of the remaining support
functions.
Table 4.10: S-PLUS functions for managing active connections.

Management Function Description

getConnection Returns the connection corresponding to the


input argument.

getAllConnections Returns a list of all open connections.

showConnections Prints a table of all open connections.

Table 4.11: Support functions that prepare connections for reading or writing.

Support Function Description

open Open a connection explicitly for reading or


writing.

isOpen Check whether a connection is open.

close Close a connection after reading from or


writing to it.

seek Position a file connection for reading or writing.

The getConnection and showConnections functions


You can view all active connections in your S-PLUS session by using
the functions getAllConnections and showConnections. The
getAllConnections function returns a list of all open connections that
includes information on the class and mode of each. The

149
Chapter 4 Writing Functions in S-PLUS

showConnections function displays this information in a convenient


tabular format. For example, suppose we open two connections to
text files:

> filecon <- open("mydata.txt")


> filecon2 <- open("mydata2.txt", mode="w")

The showConnections function returns the following:

> showConnections()

Class Mode State Description


52 "file" "*" "Read" "mydata.txt"
56 "file" "w" "Write" "mydata2.txt"

The number at the beginning of each row in the table is a unique


descriptor for the corresponding connection. We can use these
numbers with the getConnection function to access individual
connection objects. For example, the following command closes the
connection to mydata.txt:

> close(getConnection(52))
[1] T

For file connections, we can also supply getConnection with a


character string naming the file:

> close(getConnection("mydata2.txt"))
[1] T

The seek function


S-PLUS maintains separate positions on connections for reading and
writing. Thus, you can write to a connection starting from a different
location than the one used to read from the connection. Because the
positions are separate, you may need explicit control over positioning
in a connection object. The seek function allows you to do this with
file connections. It accepts the following arguments:

• A file connection.
• The argument where, which is a position measured in bytes
from the start of the file.
• The argument rw, which determines whether the "read" or
"write"position is modified.

150
Input and Output

If where is given, seek moves the rw position to the specified value;


otherwise, it returns the current rw position.
The following example from Chambers (1998) illustrates this
function. Suppose an open file connection named f exists. We read
one expression from it and then leave the connection so that reading
begins again with the same expression.

# Return the reading position in the file.


> pos <- seek(f, rw = "read")

# Parse one expression.


> myexpr <- parse(f, n = 1)

# Reset the reading position.


> seek(f, where = pos, rw = "read")

For pipe and fifo connections, data is read in the same order in
which it is written. Thus, there is no concept of a "read" position for
these connections. Likewise, data is always written to the end of pipes
and fifos, so there is also no concept of a "write" position. For
textConnection objects, only "read" positions are defined.

Reading from and Table 4.12 lists the main S-PLUS functions for reading from and
Writing to writing to connections. Wherever possible, we pair functions in the
Connections table so that relationships between the reading and writing functions
are clear. For details on the scan, cat, data.restore, data.dump,
source, dump, dget, and dput functions, see the section Writing to

151
Chapter 4 Writing Functions in S-PLUS

Files (page 137). For details on readRaw and writeRaw, see the section
Raw Data Objects (page 154). For examples using any of these
functions, see the on-line help files.
Table 4.12: S-PLUS functions for reading from and writing to connections. The first column in the table lists
functions for reading; the second column lists the corresponding writing functions (if any).

Reading Writing
Description
Function Function

parse Read n S-PLUS expressions.

parseSome Read n lines or 1 S-PLUS expression.

scan cat Read n data items.


Write any number of data items.

readLines writeLines Read n lines and return one character vector per line.
Write n lines, consisting of one character vector per line.

read.table write.table Read a two-dimensional table of data.


Write a two-dimensional table of data.

readRaw writeRaw Read raw data objects.


Write raw data objects.

data.restore data.dump Read dumped data objects.


Write S-PLUS data objects to their dumped forms.

source dump Parse and evaluate n S-PLUS expressions.


Write text representations of S-PLUS objects.

dget dput Read expressions that represent S-PLUS objects.


Write expressions that represent S-PLUS objects.

dataGet dataPut Read the S-PLUS symbolic dump format.


Write objects in the S-PLUS symbolic dump format.

Examples of Pipe The examples throughout most of this section deal mainly with file
Connections connections. This is because files are often the easiest of the
connection classes to visualize applications for, while pipes and fifos

152
Input and Output

tend to have more specialized applications. Here, we present three


examples that illustrate how you might use pipe connections in your
work.

Reading files compressed by gzip


The gzip program is a popular compression program that is
distributed under the GNU public license. Binary versions of gzip are
available from the Web site http://www.gzip.org for most flavors of
UNIX, Linux, and Windows.
Suppose you have a space-delimited collection of numbers stored in
the file primes.txt:

2 3 5 7 11
13 17 19 23 29
31 37 41 43 47
53 59 61 67 71
73 79 83 89 97

To compress the file and write the results in primes.gz, issue the
following system command:
gzip -c primes.txt > primes.gz
The following commands read the compressed file in S-PLUS:

> p1 <- pipe("gzip -d -c primes.gz")


> scan(p1, sep=" ")

[1] 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61
[19] 67 71 73 79 83 89 97

Using Perl to remove trailing commas


Perl is powerful scripting language that is very good at manipulating
text files. Binaries of Perl are freely available for nearly every
operating system; you can find out more at http://www.perl.com.
The ability to write quick, one-line scripts in Perl makes it ideal for
preprocessing data files through S-PLUS pipes. For example, suppose
you have a comma-delimited data file named comma.txt:

8.4,2.8,2.0,4.2,
4.5,0.3,
8.1,7.3,0.4,
6.1,7.2,8.3,0.6,0.7,

153
Chapter 4 Writing Functions in S-PLUS

3.7,

The process that generated the file placed a comma at the end of each
line. If you use the scan function to read this file, S-PLUS includes an
extra NA after each trailing comma. Instead, you can remove the
trailing commas and read the data into S-PLUS as follows:

> p2 <- pipe('perl -p -e "s/,$//" comma.txt')


> scan(p2, sep=",")

[1] 8.4 2.8 2.0 4.2 4.5 0.3 8.1 7.3 0.4 6.1 7.2 8.3 0.6
[14] 0.7 3.7

Using Perl to filter white space


Suppose you have a file named white.txt that contains white-space
delimited numbers. Some of the white space may be tabs, some may
be single spaces, and some might be multiple spaces:

4.02 4 2.03 1.62 4.67


2.15 2 4.83 4.87 2
4 4.38 1.83 4.38 4.73
4 4.28 5.45 1.77 4.22

Using Perl, you can replace the tabs and spaces between each pair of
numbers with a single space. You can then read the file into S-PLUS by
specifying a single white space as the delimiter. The following
commands show how to do this:

> p3 <- pipe('perl -p -e "s/[\\ \\t]+/ /g" white.txt')


> scan(p3, sep=" ")

[1] 4.02 4.00 2.03 1.62 4.67 2.15 2.00 4.83 4.87 2.00
[11] 4.00 4.38 1.83 4.38 4.73 4.00 4.28 5.45 1.77 4.22

Raw Data Raw data objects are structures that consist of undigested bytes of data.
Objects They can be thought of naturally as vectors of byte data. You can
manipulate these objects in S-PLUS with the usual vector functions to
extract subsets, replace subsets, compute lengths and define lengths.
In addition, raw data can be passed as arguments to functions,
included as slots or components in other objects, and assigned to any
database. However, raw data objects are not are not numeric and
cannot be interpreted as ordinary, built-in vectors. S-PLUS provides
no interpretation for the contents of the individual bytes: they don’t

154
Input and Output

have an intrinsic order, NAs are not defined, and coercion to numbers
or integers is not defined. The only comparison operators that make
sense in this setting are equality and inequality, interpreted as
comparing two objects overall.
In S-PLUS, raw data is usually generated in four basic ways:
1. Read the data from a file or other connection using the
functions readMapped or readRaw. Conversely, you can write
raw data to a file or connection using writeRaw.
2. Use character strings that code bytes in either hex or ascii
coding. The character strings can then be given to the
functions rawFromHex and rawFromAscii to generate the raw
data.
3. Allocate space for a raw object and then fill it through a call to
C code via the .C interface.
4. Call an S-PLUS-dependent C routine through the .Call
interface.
See Chapter 15, Interfacing With C and Fortran Code, for details on
.C and .Call interfaces. For details on additional topics not discussed
here, see Chambers (1998).
The primary S-PLUS constructors for raw data are the rawData and
raw functions. The four approaches mentioned above usually arise
more often in practice, however. All raw data objects in S-PLUS have
class "raw", regardless of how they are generated.

Examples

# Generate raw data from an ascii character vector.


> rawFromAscii(letters[1:6])
rawData(6,c("64636261","6665"))

# Generate raw data from a hex-coded vector.


> rawFromHex(rep("3af", 4))
rawData(6,c("3aaff33a","aff3"))

Raw Data on Files The readMapped function reads binary data of numeric or integer
and Connections modes from a file. Typical applications include reading data written
by another system or by a C or Fortran program. The function also
provides a way to share data with other systems, assuming you know
where the systems write data.

155
Chapter 4 Writing Functions in S-PLUS

As its name suggests, readMapped uses memory mapping to associate


the input file with an S-PLUS object, so the data is not physically
copied. Therefore, the function is suitable for reading in large objects.
The connection may be open in advance or not; in either case,
readMapped never closes it since that invalidates the mapping. Note
that you can open a file, position it with the seek function, and map
the data starting from a position other than the beginning of the file.
See the section Support Functions for Connections (page 149) for
details on seek.
The readRaw function is like readMapped, but physically reads the
data. Thus, it is suitable for connections that are not ordinary files and
cannot be memory mapped. The writeRaw function is the counterpart
to readRaw; it writes the contents of an S-PLUS object in raw form to
either a file or a text connection. Only the data values are written,
however. The resulting file does not include structural information,
and any software that reads the values needs to know the type of data
on the file.

Examples
The following example writes twenty integers to a raw data file
named x.raw, and then reads the values back in using the readRaw
function.

> x <- c(rep(5,5), rep(10,5), rep(15,5), rep(20,5))


> x
[1] 5 5 5 5 5 10 10 10 10 10 15 15 15 15 15 20 20
[18] 20 20 20

> writeRaw(x, "x.raw")


NULL

To ensure the data are read into S-PLUS as integers, set the argument
what to integer() in the call to readRaw:

> x1 <- readRaw("x.raw", integer())


> x1
[1] 5 5 5 5 5 10 10 10 10 10 15 15 15 15 15 20 20
[18] 20 20 20

The next command reads only the first 10 integers into S-PLUS:

> x2 <- readRaw("x.raw", integer(10))


> x2

156
Input and Output

[1] 5 5 5 5 5 10 10 10 10 10

You can determine the amount of data that is read into S-PLUS in one
of two ways: the length argument to readRaw or the length of the what
argument. If length is given and positive, S-PLUS uses it to define the
size of the resulting S-PLUS object. Otherwise, the length of what (if
positive) defines the size. If length is not given and what has a length
of zero, all of the data on the file or connection is read.
The following example writes twenty double-precision numbers to a
raw data file named y.raw, and then reads the values back in using
readRaw. Note that the values in the vector y must be explicitly
coerced to doubles using the as.double function, so that S-PLUS does
not interpret them as integers.

> y <- rep(as.double(1:5), times=4)


> writeRaw(y, "y.raw")
NULL

To ensure the data are read into S-PLUS as double precision numbers,
set the argument what=double() in the call to readRaw:

> y1 <- readRaw("y.raw", double())


> y1
[1] 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5

In contrast to many S-PLUS functions, it is the mode of the what


argument that matters, not the class. Classes other than "numeric"
(integer and double) might have a numeric prototype but readRaw
works the same for all of them, reading in numeric values.

157
Chapter 4 Writing Functions in S-PLUS

WRAP-UP ACTIONS
The more complicated your function, the more likely it is to complete
with some loose ends dangling. For example, the function may create
temporary files, or alter S-PLUS session options and graphics
parameters. It is good programming style to write functions that run
cleanly without permanently changing the environment. Wrap-up
actions allow you to clean up loose ends in your functions.
The most important wrap-up action is to ensure that a function
returns the appropriate value or generates the desired side effect.
Thus, the final line of a function is often the name of the object to be
returned or an expression that constructs the object. See the section
Constructing Return Values (page 134) for examples.
To restore session options or specify arbitrary wrap-up actions, use the
on.exit function. With on.exit, you ensure the desired actions are
carried out whether or not the function completes successfully. For
example, highly recursive functions often overrun the default limit for
nested expressions. The expressions argument to the options
function governs this and is set to 256 by default. Here is a version of
the factorial function that raises the limit from 256 to 1024 and then
cleans up:

fac1024 <- function(n)


{
old <- options(expressions = 1024)
on.exit(options(old))
if(n <= 1) { return(1) }
else { n * Recall(n-1) }
}

The first line of fac1024 assigns the old session options to the object
old, and then sets expressions=1024. The call to on.exit resets the
old options when the function finishes. The Recall function is used to
make recursive calls in S-PLUS.
Compare fac1024 with a function that uses the default limit on nested
expressions:

fac256 <- function(n)


{
if(n <= 1) { return(1) }

158
Wrap-Up Actions

else { n * Recall(n-1) }
}

Here is the response from S-PLUS when each function is called with
n=80.0:

> fac1024(80.0)
[1] 7.156946e+118

> fac256(80.0)

Error: Expressions nested beyond limit (256)


only 30 of 110 frames dumped
only the first of 10 elements used for string value

Note

As defined, the fac1024 function must be called with a real argument such as 80.0. If you call it
with an integer such as 80, S-PLUS overflows and returns NA. See the section Integer Arithmetic
(page 83) for a full discussion of this behavior.

To remove temporary files, you can use on.exit together with the
unlink function. For example:

fcn.A <- function(data, file=tempfile("fcn"))


{
on.exit(unlink(file))
dput(data, file=file)
#
# additional commands
#
}

The unlink function permanently deletes external files from inside of


S-PLUS.
Wrap-up actions specified by multiple calls to on.exit can be
executed sequentially. Alternatively, later actions can replace earlier
ones. The behavior that occurs is determined by the add argument to
on.exit; by default, add=T and actions are executed sequentially. For
example, the following function uses on.exit to both unlink a file and
restore graphics parameters:

159
Chapter 4 Writing Functions in S-PLUS

fcn.B <- function(data, file=tempfile("fcn"))


{
on.exit(unlink(file)
oldpar <- par()
on.exit(par(oldpar))
par(mfrow = c(3,4))
#
# make some plots and edit some data
#
dput(data, file=file)
...
}

If add=F, the new action replaces any pending wrap-up actions. For
example, suppose your function performs a long, iterative
computation and you want to write the last computed value to disk in
case of an error. You can use on.exit to accomplish this as follows:

fcn.C <- function()


{
for(i in 1:10000) {
result <- i^2
on.exit(assign("intermediate.result", result,
where=1), add=F)
}
on.exit(add=F)
return(result)
}

If we call this function and then interrupt the computation with ESC,
we see that the object intermediate.result is created. If we let the
function complete, it is not:

> fcn.C()
User interrupt requested
Use traceback() to see the call stack

> intermediate.result
[1] 665856

> rm(intermediate.result)
> fcn.C()
[1] 1e+08

160
Wrap-Up Actions

> intermediate.result
Problem: Object "intermediate.result" not found

161
Chapter 4 Writing Functions in S-PLUS

WRITING SPECIAL FUNCTIONS

Operators In addition to the built-in operators discussed in the section Function


Names and Operators (page 75), S-PLUS allows you to define your
own infix operators. Such operators must have names of the form
"%anything%", like the built-in operator "%*%". These operators are
ordinary functions, but because the string "%anything%" is not
syntactically a name, you can print them only by using the get
function:

> get("%*%")

function(x, y, ...)
UseMethod("%*%")

Here is the code for an operator that raises a matrix to a power:

"%^%" <- function(matrix, power)


{
matrix <- as(matrix, "matrix")
if(ncol(matrix) != nrow(matrix))
stop("matrix must be square")
if(length(power) != 1)
stop("power must be a single number")
if(all.equal(t(matrix), matrix)) {
# this is a symmetric matrix
e <- eigen(matrix)
m <- e$vectors %*% diag(e$values^power) %*%
t(e$vectors)
}
else {
# this is an asymmetric matrix
if(trunc(power) != power)
stop("integer power required for matrix")
m <- diag(ncol(matrix))
if(power != 0)
for(i in 1:abs(power))
m <- m %*% matrix
if(power < 0)
m <- solve(m)
}
return(m)

162
Writing Special Functions

Once defined, this operator can be used exactly as any other infix
operator:

> x <- matrix(c(2,1,1,1), ncol=2)


> x

[,1] [,2]
[1,] 2 1
[2,] 1 1

> x %^% 3

[,1] [,2]
[1,] 13 8
[2,] 8 5

You can also use this operator to find the inverse of a matrix:

> x %^% -1

[,1] [,2]
[1,] 1 -1
[2,] -1 2

User-defined operators have precedence equivalent to the built-in


operators %%, %/%, and %*%. See Table 4.1 (page 76) for a complete list
of built-in operators and their precedence.

Extraction and As we mention in the section Function Names and Operators (page
Replacement 75), S-PLUS handles assignments in which the left side is a function
call differently from those in which the left side is a name. An
Functions expression of the form f(x) <- value is evaluated as the following
assignment:

x <- "f<-"(x, value)

This requires a function named "f<-" that corresponds to f. In this


example, f is called an extraction function: it accepts a data object and
returns either a portion of the data or some attribute of it. The
function "f<-" is the corresponding replacement function: it replaces the
object extracted by f with a user-supplied value.
For example, the dim function returns the dim attribute of a matrix,
data frame, or array:

163
Chapter 4 Writing Functions in S-PLUS

> x <- matrix(1:10, nrow=5)


> x

[,1] [,2]
[1,] 1 6
[2,] 2 7
[3,] 3 8
[4,] 4 9
[5,] 5 10

> dim(x)
[1] 5 2

The result from dim states that the matrix x has 5 rows and 2 columns.
The corresponding function "dim<-" replaces the dim attribute with a
user-specified value:

> dim(x) <- c(2,5)


> x

[,1] [,2] [,3] [,4] [,5]


[1,] 1 3 5 7 9
[2,] 2 4 6 8 10

S-PLUS includes many replacement functions. Most notably, functions


associated with subscripting often have corresponding replacements;
examples include "[<-" and "[[<-". In addition, functions associated
with attribute extraction have replacement functions; "dim<-",
"names<-", and "class<-" are examples. Because the names of
replacement functions are not syntactic names, you must use the get
function to examine their definitions:

> get("dim<-")

function(x, value)
.Internal("dim<-"(x, value), "S_replace", T, 10)

In general, you should define replacement functions whenever you


write new extraction functions. New extraction functions are
generally associated with newly-created attributes. As a simple
example, suppose we want a new attribute named doc that holds a
brief description of an object. The corresponding extraction function,
doc, starts naturally enough with the attr function. A simple version
of doc is the following one-liner:

164
Writing Special Functions

doc <- function(x) { attr(x, "doc") }

The replacement function "doc<-" looks like:

"doc<-" <- function(x, value)


{
attr(x, "doc") <- value
return(x)
}

Two things are worth noting about the definition of "doc<-". First, it
returns the complete, modified object and not just the modified
attribute. Second, it performs no assignment; the S-PLUS evaluator
performs the actual assignment. These characteristics are essential for
writing clean replacement functions.
The following commands use the "doc<-" function to add a doc
attribute to the built-in data set geyser. The attribute is then printed
with the doc function:

# Assign geyser to your working directory.


> geyser <- geyser

> doc(geyser) <- "Waiting time between eruptions and the


Continue string: duration of the eruption for the Old
Continue string: Faithful geyser in Yellowstone."

> doc(geyser)

[1] "Waiting time between eruptions and the\nduration of


the eruption for the Old\nFaithful geyser in Yellowstone."

Because of the newline characters, this is not the most readable form.
However, if we modify the doc function slightly to use cat instead, we
obtain output that is easier to read:

> doc <- function(x) { cat(attr(x, "doc"), sep="\n ") }


> doc(geyser)

Waiting time between eruptions and the


duration of the eruption for the Old
Faithful geyser in Yellowstone.

165
Chapter 4 Writing Functions in S-PLUS

You can build extraction functions to extract almost any piece of data
that you are interested in. Such functions typically use other
extraction functions as their starting points. For example, the
following functions use subscripting to find the elements of an input
vector that have even and odd indices:

evens <- function(x)


{
indices <- seq(along = x)
return(x[indices %% 2 == 0])
}

odds <- function(x)


{
indices <- seq(along = x)
return(x[indices %% 2 == 1])
}

The following examples illustrate these functions:

> evens(1:10)
[1] 2 4 6 8 10

> odds(1:10)
[1] 1 3 5 7 9

In evens and odds, we build on the subscripting function "[" to


extract particular subsets of the input data. Thus, the subscripting
replacement function "[<-" is the logical place to start in writing the
corresponding replacements "evens<-" and "odds<-":

"evens<-" <- function(x, value)


{
indices <- seq(along = x)
x[indices %% 2 == 0] <- value
return(x)
}

"odds<-" <- function(x, value)


{
indices <- seq(along = x)
x[indices %% 2 == 1] <- value
return(x)
}

166
Writing Special Functions

The following examples illustrate replacement using these two


functions:

> xx <- 1:10


> xx
[1] 1 2 3 4 5 6 7 8 9 10

> odds(xx) <- c(10,20,30,40,50)


> evens(xx) <- c(11,21,31,41,51)
> xx
[1] 10 11 20 21 30 31 40 41 50 51

As a final example of extraction and replacement, consider the


problem of extracting and replacing row names in a matrix.
Normally, you extract the names using the dimnames function and
replace them using "dimnames<-". However, it would be convenient
to simply type rownames(x) and see the row names of a matrix x.
Here is a simple function that does this:

rownames <- function(x)


{
if(!is.null(dimnames(x)[[1]]))
return(dimnames(x)[[1]])
else
return(character(dim(x)[1]))
}

If the first element of dimnames(x) is NULL, the rownames function


returns a vector of empty character strings that has length equal to the
number or rows in x.
The corresponding replacement function inserts new row names
while preserving any existing column names:

"rownames<-" <- function(x, value)


{
if(!is.null(dimnames(x)[[2]]))
colnames <- dimnames(x)[[2]]
else
colnames <- NULL
dimnames(x) <- list(value, colnames)
return(x)
}

167
Chapter 4 Writing Functions in S-PLUS

The following commands illustrate the rownames and "rownames<-"


functions using the built-in data set state.x77:

> rownames(state.x77)

[1] "Alabama" "Alaska" "Arizona"


[4] "Arkansas" "California" "Colorado"
[7] "Connecticut" "Delaware" "Florida"
[10] "Georgia" "Hawaii" "Idaho"
[13] "Illinois" "Indiana" "Iowa"
[16] "Kansas" "Kentucky" "Louisiana"
[19] "Maine" "Maryland" "Massachusetts"
[22] "Michigan" "Minnesota" "Mississippi"
[25] "Missouri" "Montana" "Nebraska"
[28] "Nevada" "New Hampshire" "New Jersey"
[31] "New Mexico" "New York" "North Carolina"
[34] "North Dakota" "Ohio" "Oklahoma"
[37] "Oregon" "Pennsylvania" "Rhode Island"
[40] "South Carolina" "South Dakota" "Tennessee"
[43] "Texas" "Utah" "Vermont"
[46] "Virginia" "Washington" "West Virginia"
[49] "Wisconsin" "Wyoming"

# Assign state.x77 to your working directory.


> state.x77 <- state.x77
> rownames(state.x77) <- c(LETTERS[1:25], letters[1:25])
> rownames(state.x77)

[1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L"
[13] "M" "N" "O" "P" "Q" "R" "S" "T" "U" "V" "W" "X"
[25] "Y" "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k"
[37] "l" "m" "n" "o" "p" "q" "r" "s" "t" "u" "v" "w"
[49] "x" "y"

168
References

REFERENCES
Chambers, J.M. (1998). Programming with Data: A Guide to the S
Language. New York: Springer-Verlag.
Venables, W.N. and Ripley, B.D. (2000). S Programming. New York:
Springer-Verlag.

169
Chapter 4 Writing Functions in S-PLUS

170
IMPORTING AND
EXPORTING

Supported File Types for Importing and Exporting


5
172
Importing Data 176
Using the importData Function 176
Other Data Import Functions 188
Exporting Data 192
Using the exportData Function 192
Other Data Export Functions 194
Exporting Graphs 196
Specifying the ExportType Argument 197
Specifying the QFactor Argument 199
Specifying the ColorBits Argument 199
Creating HTML Output 202
Tables 202
Text 203

171
Chapter 5 Importing and Exporting

SUPPORTED FILE TYPES FOR IMPORTING AND


EXPORTING
Table 5.1 lists all the supported file formats for importing and
exporting data. Note that S-PLUS both imports from and exports to all
the listed types with two exceptions: SigmaPlot (.jnb) files are import
only and HTML (.htm*) tables are export only.

Table 5.1: Supported file types for importing and exporting data.

Default
Format Type Extension Notes

ASCII File "ASCII" .csv Comma delimited.


.asc, .csv, .txt, Delimited.
.prn
Whitespace delimited; space delimited;
.asc, .dat, .txt, tab delimited; user-defined delimiter.
.prn

dBASE File "DBASE" .dbf II, II+, III, IV files.

Epi Info File "EPI" .rec

Fixed Format "FASCII" .fix, .fsc


ASCII File

FoxPro File "FOXPRO" .dbf

Gauss Data File "GAUSS", .dat Automatically reads the related DHT
"GAUSS96" file, if any, as GAUSS 89. If no DHT file
is found, reads the .DAT file as
GAUSS96.

HTML Table "HTML" .htm* Export only.

Lotus 1-2-3 "LOTUS" .wk*, .wr*


Worksheet

172
Supported File Types for Importing and Exporting

Table 5.1: Supported file types for importing and exporting data. (Continued)

Default
Format Type Extension Notes

Matlab Matrix "MATLAB" .mat File must contain a single matrix.


Versions 4 and lower – import/export;
Version 5 – import only.

Minitab “MINITAB” .mtw Versions 8 through 12.


Workbook

Microsoft "ACCESS" .mdb


Access File

Microsoft Excel "EXCEL" .xl? Versions 2.1 through 2000.


Worksheet

Microsoft SQL "MS-SQL" .sql


Server

ODBC “ODBC” Not applicable For Informix (.ifx), Oracle (.ora), and
Database SYBASE (.syb) databases.

Paradox Data "PARADOX" .db


File

QuattroPro "QUATTRO" .wq?, .wb?


Worksheet

S-PLUS File "SPLUS" .sdd Windows, DEC UNIX. Uses


data.restore() to import file.

173
Chapter 5 Importing and Exporting

Table 5.1: Supported file types for importing and exporting data. (Continued)

Default
Format Type Extension Notes

SAS File "SAS", .sd2 SAS version 6 files, Windows.


"SASV6"

"SAS1", .ssd01 SAS version 6 files, HP, IBM, Sun


"SAS6UX32" UNIX.

"SAS4", .ssd04 SAS version 6 files, Digital UNIX.


"SAS6UX64"

"SAS7" .sas7bdat, .sd7 SAS version 7 or 8 files, current


platform.

"SAS7WIN" .sas7bdat, .sd7 SAS version 7 or later data files,


Windows.

"SAS7UX32" .sas7bdat, .sd7 SAS version 7 or later data files, Solaris


(SPARC), HP-UX, IBM AIX.

"SAS7UX64" .sas7bdat, .sd7 SAS version 7 or later data files,


Digital/Compaq UNIX.

SAS Transport "SAS_TPT" .xpt, .tpt Version 6.x. Some special export
File options may need to be specified in
your SAS program. We suggest using
the SAS Xport engine (not PROC
CPORT) to read and write these files.

SigmaPlot File "SIGMAPLOT" .jnb Import only.

SPSS Data File "SPSS" .sav OS/2; Windows; HP, IBM, Sun, DEC
UNIX.

SPSS Portable "SPSSP" .por


File

Stata Data File "STATA" .dta Versions 2.0 and higher.

174
Supported File Types for Importing and Exporting

Table 5.1: Supported file types for importing and exporting data. (Continued)

Default
Format Type Extension Notes

SYSTAT File "SYSTAT" .syd, .sys Double- or single-precision .sys files.

175
Chapter 5 Importing and Exporting

IMPORTING DATA

Using the The principal tool for importing data is the importData function,
importData which can be invoked from either the S-PLUS prompt or the File 
Import Data menu option.
Function
In most cases, all you need to do to import a data file is to call
importData with the name of the file to be imported as the only
argument. As long as the specified file has one of the default
extensions listed in Table 5.1, you need not specify a type nor, in most
cases, any other information.
For example, suppose you have a SAS data file named rain.sd2 in
your start-up folder. You can read this file into S-PLUS using
importData as follows:

> myRain <- importData("rain.sd2")

If you have trouble reading the data, most likely you just need to
supply additional arguments to importData to specify extra
information required by the data importer to read the data correctly.
Table 5.2 lists the arguments to the importData function.
Table 5.2: Arguments to importData.

Required or
Argument Optional Description

file Required A character string specifying the name of the file


(except for and directory path.
database reads)

type Optional A character string specifying the file type of the file
to be imported. See the “Type” column of Table 5.1
for a list of possible values.

keep Optional A character vector of variable names, or a numeric


vector of column numbers, specifying which
variables are to be imported. Only one of keep or
drop may be specified.

176
Importing Data

Table 5.2: Arguments to importData. (Continued)

Required or
Argument Optional Description

drop Optional A character vector of variable names, or a numeric


vector of column numbers, specifying which
variables are not to be imported. Only one of keep
or drop may be specified.

colNames Optional A character vector of names to use for the imported


columns.

rowNamesCol Optional An integer specifying the column that contains the


row names. If specified, the column of row names is
dropped from the resulting data frame.

filter Optional A character string containing a logical expression


for selecting the rows to be imported. For details,
see Filter Expressions on page 180.

format Optional A single character string specifying the format for


each field when importing from a formatted ASCII
(FASCII) text file. For details, see Notes on
Importing Files of Certain Types on page 182.

delimiter Optional A character string specifying the delimiter to use.


This argument is used only when importing ASCII
text files.

startCol Optional An integer specifying the starting column in the


source. For example, if you specify 5, S-PLUS begins
reading the data at column 5.

endCol Optional An integer specifying the ending column in the


source. The default of -1 means to read to the last
column.

startRow Optional An integer specifying the starting row in the source


(used only for spreadsheets). For example, if you
specify 10, S-PLUS begins reading the data at row 10.

177
Chapter 5 Importing and Exporting

Table 5.2: Arguments to importData. (Continued)

Required or
Argument Optional Description

endRow Optional An integer specifying the ending row in the source


(used only for spreadsheets). The default of -1
means to read to the last row.

pageNumber Optional The page number of the spreadsheet (used only for
spreadsheets).

colNameRow Optional An integer specifying the row that contains the


column names (used only for spreadsheets). If you
do not specify a row, S-PLUS attempts to locate
column names in the first row of the file. Specify 0
to tell S-PLUS not to search for a column names row.
In a delimited ASCII file, the column names row
must come before the first data row (startRow) to be
read.

server Optional When importing from a relational database, a


character string specifying the database server.

user Optional When importing from a relational database, a


character string specifying the user name.

password Optional A character string specifying the password for the


database user.

database Optional A character string specifying the name of the


database to use when importing from a relational
database. This should be set to "" if type="ORACLE".

table Optional A character string specifying the name of the table


in database to import.

stringsAsFactors Optional A logical flag. If TRUE, strings are converted to


factors when imported.

178
Importing Data

Table 5.2: Arguments to importData. (Continued)

Required or
Argument Optional Description

sortFactorLevels Optional A logical flag. If TRUE, levels for any factors created
from strings are sorted.

valueLabelAsNumber Optional A logical flag. If TRUE, SAS and SPSS variables with
labels are imported as numbers.

centuryCutoff Optional A numeric value specifying the origin for two-digit


dates. Dates with two-digit years are assigned to the
100-year span beginning with this value. The default
value of 1930 means that the date 6/15/30 will be
read as June 15, 1930 and 12/29/29 will be read as
December 29, 2029. This argument is used only
when importing from an ASCII file.

separateDelimiters Optional A logical flag. If TRUE, the separator is strictly a


single character; otherwise, repeated consecutive
separator characters are treated as one separator.

odbcConnection Required if An encrypted character string containing the


type="ODBC" ODBC connection string.

odbcSqlQuery Optional Contains an optional SQL query. If no query is


specified, the first table of the data source is used.
Meaningful only if type="ODBC".

readAsTable Optional A logical flag. If TRUE, S-PLUS reads the entire file as
a single table.

colNamesUpperCase Optional A logical flag. If TRUE, column names are imported


in all uppercase.

time.in.format Optional A character string specifying the format to use to


interpret date/time data when importing from an
ASCII or FASCII text file.

179
Chapter 5 Importing and Exporting

Filter Expressions The filter argument to importData allows you to subset the data you
import. By specifying a query, or filter, you gain additional
functionality, such as taking a random sampling of the data. Use the
following examples and explanation of the filter syntax to create your
statement. A blank filter is the default and results in all data being
imported.

Note

The filter argument is ignored if the type argument (or, equivalently, file extension specified in
the file argument) is set to "ASCII" or "FASCII".

Case selection
You select cases by using a case-selection statement in the filter
argument. The case-selection or where statement has the following
form:

"variable expression relational operator condition"

Warning

The syntax used in the filter argument to importData and exportData is not standard S-PLUS
syntax, and the expressions described are not standard S-PLUS expressions. Do not use the
syntax described in this section for any purpose other than passing a filter argument to
importData or exportData.

Variable expressions
You can specify a single variable or an expression involving several
variables. All of the usual arithmetic operators (+ - * / ()) are
available for use in variable expressions, as well as the relational
operators listed in Table 5.3.

Table 5.3: Relational operators.

Operator Description

== Equal to

!= Not equal to

180
Importing Data

Table 5.3: Relational operators. (Continued)

Operator Description

< Less than

> Greater than

<= Less than or equal to

>= Greater than or equal to

& And

| Or

! Not

Examples
Examples of selection conditions given by filter expressions are:

"sex = 1 & age < 50"


"(income + benefits) / famsize < 4500"
"income1 >=20000 | income2 >= 20000"
"income1 >=20000 & income2 >= 20000"
"dept = ’auto loan’"

Note that strings used in case-selection expressions must be enclosed


in single quotes if they contain embedded blanks.
Wildcards * or ? are available to select subgroups of string variables.
For example:

"account = ????22"
"id = 3*"

The first statement will select any accounts that have 2s as the 5th and
6th characters in the string, while the second statement will select
strings of any length that begin with 3.

181
Chapter 5 Importing and Exporting

The comma operator is used to list different values of the same


variable name that will be used as selection criteria. It allows you to
bypass lengthy OR expressions when giving lists of conditional
values. For example:

"state = CA,WA,OR,AZ,NV"
"caseid != 22*,30??,4?00"

Missing variables
You can test to see that any variable is missing by comparing it to the
special internal variable, NA. For example:

"income != NA & age != NA"

Notes on ASCII (delimited ASCII) files


Importing Files When importing ASCII files, you have the option of specifying
of Certain Types column names and data types for imported columns. This can be
useful if you want to name columns or to skip over one or more
columns when importing.
Use the format argument to importData to specify the data types of
the imported columns. (Note that field-width specifications are
irrelevant for ASCII files and are ignored.) For each column, you
need to specify a percent sign (%) and then the data type. Dates may
be imported automatically as numbers. After importing, you can
change the column format type to a dates format.
Here is an example format string:

%s, %f, %*, %f

The s denotes a string data type, the f denotes a float data type
(actually, numeric), and the asterisk (*) denotes a “skipped” column.
These are the only allowable format types.
If you do not specify the data type of each column, S-PLUS looks at
the first row of data to be read and uses the contents of this row to
determine the data type of each column. A row of data must always
end with a new line.
S-PLUS auto-detects the file delimiter from a preset list that includes
commas, spaces, and tabs. All cells must be separated by the same
delimiter (that is, each file must be comma-separated, space-

182
Importing Data

separated, or tab-separated.) Multiple delimiter characters are not


grouped and treated the same as a single delimiter. For example, if the
comma is a delimiter, two commas are interpreted as a missing field.
Double quotes ("") are treated specially. They are always treated as an
“enclosure” marker and must always come in pairs. Any data
contained between double quotes are read as a single unit of character
data. Thus, spaces and commas can be used as delimiters, and spaces
and commas can still be used within a character field as long as that
field is enclosed within double quotes. Double quotes cannot be used
as standard delimiters.
If a variable is specified to be numeric, and if the value of any cell
cannot be interpreted as a number, that cell is filled with a missing
value. Incomplete rows are also filled with missing values.

FASCII (formatted ASCII) files


You can use FASCII import to specify how each character in your
imported file should be treated. For example, you must use FASCII
for fixed-width columns not separated by delimiters if the rows in
your file are not separated by line feeds or if your file splits each row
of data into two or more lines.
For FASCII import, you need to specify the file name and the file
type. In addition, because FASCII files are assumed to be
nondelimited (for example, there are no commas or spaces separating
fields), you also need to specify each column’s field width and data
type in the format string. This tells S-PLUS where to separate the
columns. Each column must be listed along with its data type
(character or numeric) and its field width. If you want to name the
columns, specify a list of names in the colNames argument (column
names cannot be read from a FASCII data file).
When importing a FASCII file, you need to specify a value for the
colNames argument to importData. Enter a character vector of
column names for the imported data columns (separated by spaces or
commas). Specify one column name for each imported column (for
example, Apples, Oranges, Pears). You can use an asterisk (*) to
denote a missing name (for example, Apples, *, Pears).
When importing a FASCII file, you also need to specify the data
types and field widths of the imported columns by entering a value
for the format argument to importData. For each column, you need to
specify a percent sign (%), then the field width, and then the data type.

183
Chapter 5 Importing and Exporting

Commas or spaces must separate each specification in the string. The


format string is necessary because formatted ASCII files do not have
delimiters (such as commas or spaces) separating each column of
data.
Here is an example format string:

%10s, %12f, %5*, %10f

The numbers denote the column widths, s denotes a string data type,
f denotes a float data type, and the asterisk (*) denotes a “skip.” You
may need to skip characters when you want to avoid importing some
characters in the file. For example, you may want to skip blank
characters or even certain parts of the data.
If you want to import only some of the rows, specify a starting and
ending row.
If each row ends with a new line, S-PLUS treats the newline character
as a single character-wide variable that is to be skipped.

Microsoft Excel files


If your Excel worksheet contains numeric data only in a rectangular
block, starting in the first row and column of the worksheet, then all
you need to specify is the file name and file type. If a row contains
names, specify the number of that row in the colNameRow argument (it
does not have to be the first row). You can select a rectangular subset
of your worksheet by specifying starting and ending columns and
rows. Excel-style column names (for example, A, AB) can be used to
specify the starting and ending columns.

Lotus files
If your Lotus-type worksheet contains numeric data only in a
rectangular block, starting in the first row and column of the
worksheet, then all you need to specify is the file name and file type.
If a row contains names, specify the number of that row in the
colNameRow argument (it does not have to be the first row). You can
select a rectangular subset of your worksheet by specifying starting
and ending columns and rows. Lotus-style column names (for
example, A, AB) can be used to specify the starting and ending
columns.

184
Importing Data

The row specified as the starting row is always read first to determine
the data types of the columns. Therefore, there cannot be any blank
cells in this row. In other rows, blank cells are filled with missing
values.

dBASE files
S-PLUS imports dBASE and dBASE-compatible files. The file name
and file type are often the only things you need specify for dBASE-
type files. Column names and data types are obtained from the
dBASE file. However, you can select a rectangular subset of your data
by specifying starting and ending columns and rows.

Data from ODBC data sources


To access a database on a remote server, S-PLUS must establish a
communication link to the server across the network. The
information required to create this link is contained in an ODBC
connection string. This string consists of one or more attributes that
specify how a driver connects to a data source. An attribute identifies
a specific piece of information that the driver needs to know before it
can make the appropriate data source connection. Each driver may
have a different set of attributes, but the connection string is always of
the form:

DSN=dataSourceName [;SERVER=value] [;PWD=value]


[;UID= value] [;<Attribute>=<value>]

You must specify the data source name if you do not specify the user
ID, password, server, and driver attributes. However, all other
attributes are optional. If you do not specify an attribute, that attribute
defaults to the value specified in the relevant DSN tab of the ODBC
Data Source Administrator.

Note

For some drivers, attribute values are case-sensitive.

For example, a connection string that connects to the Employees data


source using the hr.db server and user joesmith’s account
information would be:

"DSN=Employees;UID=joesmith;PWD=secret;SERVER=hr.db"

185
Chapter 5 Importing and Exporting

The S-PLUS GUI encrypts ODBC connection strings to protect


sensitive information such as user IDs and passwords. To connect to
your database from the command line with an encrypted connection
string, first establish connectivity from the GUI and then examine
your history log by choosing Windows  History  Display from
the main menu or clicking the History Log button on the
Standard toolbar. Simply copy the encrypted connection string into
your script or to the Commands window.

Note

ODBC import and export facilities do not support "nchar" or "nvarchar" data types. The
"varchar" type is supported.

To import data from a database via ODBC, use the standard


importData function with the type=ODBC argument. Three additional
parameters control the call to the ODBC interface:
• file supplies the name of the data source;
• odbcConnection supplies the ODBC connection string;
• odbcSqlQuery supplies an optional SQL query. For example,
this query would specify the table you want to import. If no
query is specified, the first table of the data source is used.
For example, this command creates a new data frame called
myDataSet and fills it with the contents of Table23 from data source
testSQLServer:

> myDataSet <-importData(


file = “testSQLServer”,
type = “ODBC”,
odbcConnection =
“DSN=testSQLServer;UID=joesmith;PWD=secret; APP=S-
PLUS;WSID=joesComputer;DATABASE=testdba”,
odbcSqlQuery=”Select * from testdba.dbo.Table23”
)

You can use the filter argument in the importData function to filter
data, as described on page 180.

186
Importing Data

To export data from S-PLUS via ODBC, use the standard exportData
function with the type=ODBC argument. Four additional parameters
control the call to the ODBC interface:
• data supplies the data frame to be exported;
• file supplies the name of the data source;
• odbcConnection supplies the ODBC connection string;
• odbcTable supplies the name of the table to be created.
For example, this command exports the data frame myDataSet to
Table23 of data source testSQLServer:
exportData(data=”myDataSet”, file=”testSQLServer”,
type=”ODBC”, odbcConnection =
“DSN=testSQLServer;UID=joesmith;PWD=secret; APP=S-
PLUS;WSID=joesComputer;DATABASE=testdba”,
odbcSqlQuery=”Select * from testdba.dbo.Table23”
)

Beware that if you export data to an existing table name, it is possible


to change the schema for that table. This is because S-PLUS
essentially replaces the existing tables with a new table containing the
exported data. Also note that it is not possible to append data to a
table. If you wish to append data to an existing table, export the data
to a dummy table and then use SQL commands on the database side
to join the two tables.
A new function in S-PLUS 6.0 is executeSql, which sends arbitrary
SQL statements to a database via ODBC. The function has the
following form:

executeSql(odbcConnection = character(0), odbcSqlQuery =


character(0), returnData)

where
odbcConnection is the connection string to the database
odbcSqlQuery is the statement passed to the database
returnData is the flag to return the data (default=F)
The following is an example of adding a record to an existing table:

187
Chapter 5 Importing and Exporting

executeSql("DSN=mydatabase","INSERT into mytable values


('Hello')")

Note that if returnData is set to T, the SQL will be evaluated twice.

Other Data While importData is the recommended method for reading data files
Import into S-PLUS, there are several other functions that you can use to read
ASCII data. These functions are commonly used by other functions
Functions in S-PLUS so it is a good idea to familiarize yourself with them.

The scan The scan function, which can read either from standard input or from
Function a file, is commonly used to read data from keyboard input. By default,
scan expects numeric data separated by white space, although there
are options that let you specify the type of data being read and the
separator. When using scan to read data files, it is helpful to think of
each line of the data file as a record, or case, with individual
observations as fields. For example, the following expression creates a
matrix named x from a data file specified by the user:

x <- matrix(scan("filename"), ncol = 10, byrow = T)

Here the data file is assumed to have 10 columns of numeric data; the
matrix contains a number of observations for each of these ten
variables. To read in a file of character data, use scan with the what
argument:

x <- matrix(scan("filename", what = ""), ncol=10, byrow=T)

Any character vector can be used in place of "". For most efficient
memory allocation, what should be the same size as the object to be
read in. For example, to read in a character vector of length 1000, use

> scan(what=character(1000))

The what argument to scan can also be used to read in data files of
mixed type, for example, a file containing both numeric and
character data, as in the following sample file, table.dat:

Tom 93 37
Joe 47 42
Dave 18 43

In this case, you provide a list as the value for what, with each list
component corresponding to a particular field:

188
Importing Data

> z <- scan("table.dat",what=list("",0,0))


> z
[[1]]:
[1] "Tom" "Joe" "Dave"

[[2]]:
[1] 93 47 18

[[3]]:
[1] 37 42 43

S-PLUS creates a list with separate components for each field specified
in the what list. You can turn this into a matrix, with the subject names
as column names, as follows:

> matz <- rbind(z[[2]],z[[3]])


> dimnames(matz) <- list(NULL, z[[1]])
> matz
Tom Joe Dave
[1,] 93 47 18
[2,] 37 42 43

You can scan files containing multiple line records by using the
argument multi.line=T. For example, suppose you have a file
heart.all containing information in the following form:

johns 1
450 54.6
marks 1 760 73.5
. . .

You can read it in with scan as follows:

> scan(’heart.all’,what=list("",0,0,0),multi.line=T)
[[1]]:
[1] "johns" "marks" "avery" "able" "simpson"
. . .
[[4]]:
[1] 54.6 73.5 50.3 44.6 58.1 61.3 75.3 41.1 51.5 41.7 59.7
[12] 40.8 67.4 53.3 62.2 65.5 47.5 51.2 74.9 59.0 40.5

If your data file is in fixed format, with fixed-width fields, you can use
scan to read it in using the widths argument. For example, suppose
you have a data file dfile with the following contents:

189
Chapter 5 Importing and Exporting

01giraffe.9346H01-04
88donkey .1220M00-15
77ant L04-04
20gerbil .1220L01-12
22swallow.2333L01-03
12lemming L01-23

You identify the fields as numeric data of width 2, character data of


width 7, numeric data of width 5, character data of width 1, numeric
data of width 2, a hyphen or minus sign that you don’t want to read
into S-PLUS, and numeric data of width 2. You specify these types
using the what argument to scan. To simplify the call to scan, you
define the list of what arguments separately:

> dfile.what <- list(code=0, name="", x=0, s="", n1=0,


+ NULL, n2=0)

(NULL indicates suppress scanning of the specified field.) You specify


the widths as the widths argument to scan. Again, it simplifies the call
to scan to define the widths vector separately:

> dfile.widths <- c(2, 7, 5, 1, 2, 1, 2)

You can now read the data in dfile into S-PLUS calling scan as
follows:

> dfile <- scan("dfile", what=dfile.what,


+ widths=dfile.widths)

If some of your fixed-format character fields contain leading or


trailing white space, you can use the strip.white argument to strip it
away. (The scan function always strips white space from numeric
fields.) See the scan help file for more details.

The read.table Data frames in S-PLUS were designed to resemble tables. They must
Function have a rectangular arrangement of values and typically have row and
column labels. Data frames arise frequently in designed experiments
and other situations. If you have a text file with data arranged in the
form of a table, you can read it into S-PLUS using the read.table
function. For example, consider a data file named auto.dat that
contains the records listed below.

Model Price Country Reliab Mileage Type


AcuraIntegra4 11950 Japan 5 NA Small
Audi1005 26900 Germany NA NA Medium

190
Importing Data

BMW325i6 24650 Germany 94 NA Compact


ChevLumina4 12140 USA NA NA Medium
FordFestiva4 6319 Korea 4 37 Small
Mazda929V6 23300 Japan 5 21 Medium
MazdaMX-5Miata 13800 Japan NA NA Sporty
Nissan300ZXV6 27900 Japan NA NA Sporty
OldsCalais4 9995 USA 2 23 Compact
ToyotaCressida6 21498 Japan 3 23 Medium

All fields are separated by spaces, and the first line is a header line.To
create a data frame from this data file, use read.table as follows:

> auto <- read.table('auto.dat',header=T)


> auto
Price Country Reliab Mileage Type
AcuraIntegra4 11950 Japan 5 NA Small
Audi1005 26900 Germany NA NA Medium
BMW325i6 24650 Germany 94 NA Compact
ChevLumina4 12140 USA NA NA Medium
FordFestiva4 6319 Korea 4 37 Small
Mazda929V6 23300 Japan 5 21 Medium
MazdaMX-5Miata 13800 Japan NA NA Sporty
Nissan300ZXV6 27900 Japan NA NA Sporty
OldsCalais4 9995 USA 2 23 Compact
ToyotaCressida6 21498 Japan 3 23 Medium

As with scan, you can use read.table within functions to hide the
mechanics of S-PLUS from the users of your functions.

191
Chapter 5 Importing and Exporting

EXPORTING DATA

Using the You use the exportData function to export S-PLUS data objects to
exportData formats for applications other than S-PLUS. (To export data for use by
S-PLUS, use the data.dump function—see page 194.) You can invoke
Function exportData from either the S-PLUS prompt or the File  Export
Data menu option.
When exporting to most file types with exportData, you typically
need to specify only the data set, file name, and (depending on the file
name you specified) the file type, and the data will be exported into a
new data file using default settings. For greater control, you can
specify your own settings by using additional arguments to
exportData. Table 5.4 lists the arguments to the exportData function.

Table 5.4: Arguments to exportData.

Required or
Argument Optional Description

data Required The data frame or matrix to be exported.

file Required A character string specifying the name of the export


file to create.

type Optional A character string specifying the file type of the


export file. See the “Type” column of Table 5.1 for a
list of possible values.

keep Optional A character vector of variable names, or a numeric


vector of column numbers, specifying which
variables are to be exported. Only one of keep or
drop may be specified.

drop Optional A character vector of variable names, or a numeric


vector of column numbers, specifying which
variables are not to be exported. Only one of keep
or drop may be specified.

192
Exporting Data

Table 5.4: Arguments to exportData. (Continued)

Required or
Argument Optional Description

filter Optional A character string containing a logical expression


for selecting the rows to be exported. For details, see
Filter Expressions on page 180.

format Optional A single character string specifying the format for


each field when exporting to a formatted ASCII
(FASCII) text file. For details, see Notes on
Importing Files of Certain Types on page 182.

delimiter Optional A character string specifying the delimiter to use.


The default is a blank space (" "). This argument is
used only when exporting to ASCII text files.

colNames Optional A logical flag. If TRUE, column names are also


exported.

rowNames Optional A logical flag. If TRUE, row names are also exported.

quote Optional A logical flag. If TRUE, quotes are placed around


character strings. The default is TRUE.

odbcConnection Required if An encrypted character string containing the


type="ODBC" ODBC connection string.

odbcTable Required if The name of the ODBC table to be created.


type="ODBC"

time.out.format Optional A character string specifying the format to use when


exporting date/time data to ASCII or FASCII text
files.

193
Chapter 5 Importing and Exporting

Other Data In addition to the exportData function, S-PLUS provides several other
Export functions for exporting data, discussed below.
Functions
The data.dump When you want to share your data with another S-PLUS user, you can
Function export your data to an S-PLUS file format by using the data.dump
function:

> data.dump("matz")

By default, the data object matz is exported to the file dumpdata in


your S-PLUS start-up folder. You can specify a different output file
with the connection argument to data.dump:

> data.dump("matz", connection="matz.dmp")

Hint

The connection argument needn’t specify a file; it can specify any valid S-PLUS connection
object.

If the data object you want to share is not in your working data, you
must specify the object’s location in the search path with the where
argument:

> data.dump("halibut", where="data")

The cat and The inverse operation to the scan function is provided by the cat and
write Functions write functions. The result of either cat or write is just an ASCII file
with data in it; there is no S-PLUS structure written to the file. Of the
two commands, write has an argument for specifying the number of
columns and thus is more useful for retaining the format of a matrix.
The cat function is a general-purpose writing tool in S-PLUS, used for
writing to the screen as well as writing to files. It can be useful in
creating free-format data files for use with other software, particularly
when used with the format function:

> cat(format(runif(100)), fill=T)


0.261401257 0.556708986 0.184055283 0.760029093 ....

194
Exporting Data

The argument fill=T limits line length in the output file to the width
specified in your options object. To use cat to write to a file, simply
specify a file name with the file argument:

> x <- 1:1000


> cat(x,file="mydata",fill=T)

Note

The files written by cat and write do not contain S-PLUS structure information. To read them
back into S-PLUS, you must reconstruct this information.

By default, write writes matrices column by column, five values per


line. If you want the matrix represented in the ASCII file in the same
form it is represented in S-PLUS, first transform the matrix with the t
function and specify the number of columns in your original matrix:

> mat
[,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
> write(t(mat), "mat", ncol=4)

You can view the resulting file with a text editor; it contains the
following three lines:

1 4 7 10
2 5 8 11
3 6 9 12

The write.table The inverse operation to read.table is provided by write.table.


Function The write.table function can be used to export a data frame into an
ASCII text file:

> write.table(fuel.frame, "fuel.txt")

195
Chapter 5 Importing and Exporting

EXPORTING GRAPHS
The export.graph function is used to export a graph named Name to
the file FileName using the file format specified by ExportType. Table
5.5 lists the arguments to the export.graph function.
Table 5.5: Arguments to export.graph.

Required or
Argument Optional Description

FileName Required A character string specifying the name of the file to


be created. If a file by this name already exists, it is
overwritten.

Name Optional A character string specifying the object pathname


for a graphsheet. The default uses the graphsheet
that is currently active. If no graphsheet is active,
then Name is required.

ExportType Optional A character string specifying the file type of the


exported graph. For a complete discussion of this
argument, see page 197.

Qfactor Optional An integer value that determines the degree of loss


in the compression process. For a complete
discussion of this argument, see page 199.

ColorBits Optional An integer value that specifies the color bits value
used when saving an image. For a complete
discussion of this argument, see page 199.

Height Optional A numeric value that specifies the height of the


output image. This argument accepts any floating
point value. The default of -1 causes the graph page
height to be used.

196
Exporting Graphs

Table 5.5: Arguments to export.graph. (Continued)

Required or
Argument Optional Description

Width Optional A numeric value that specifies the width of the


output image. This argument accepts any floating
point value. The default of -1 causes the graph page
width to be used.

Units Optional A character string that specifies the units of the


Height and Width arguments. Recognized values are
"inch" and "cm"; any other input is interpreted as
"inch", the default value.

Specifying the Some of the most common values for the ExportType argument
ExportType include "BMP", "WMF", "EPS", "EPS TIFF", "TIF", "GIF", "JPG", "PNG",
"IMG", "EXIF", "PCT", "TGA", and "WPG". If this argument is not
Argument specified, the file type is inferred from the extension used in the
FileName argument.

Table 5.6 describes the map between file extensions and file types. If
FileName does not include an extension from Table 5.6, one is added
based on the value of this argument. To export a graph to a file that
does not have an extension, specify the appropriate ExportType
format and end the FileName character string with a period.
Table 5.6: Map between file extensions and file types for the ExportType argument.

Extension ExportType Setting File Format

.bmp BMP Windows Bitmap, with no compression

.cal CAL CALS Raster file

.cmp CMP LEAD Compression Format

.emf EMF Windows Enhanced MetaFile

.eps EPS Encapsulated PostScript

197
Chapter 5 Importing and Exporting

Table 5.6: Map between file extensions and file types for the ExportType argument. (Continued)

Extension ExportType Setting File Format

.fax FAX Raw FAX, compressed using CCITT group 3, 1


dimension

.gif GIF CompuServe GIF (requires that LZW compression be


enabled)

.ica ICA IOCA, compressed using CCITT group 3, 1


dimension

.img IMG GEM Image

.jpg JPG JPEG File Interchange Format with YUV 4:4:4 color
space

.mac MAC MacPaint

.msp MSP Microsoft Paint

.pct PCT MacPic

.pcx PCX ZSoft PCX

.png PNG Portable Network Graphics

.psd PSD Adobe Photoshop 3.0

.ras RAS Sun Raster file

.tga TGA TrueVision TARGA

.tif TIF Tagged Image File Format, with no compression and


with RGB color space

.wfx WFX Winfax, compressed using CCITT group 3, 1


dimension

198
Exporting Graphs

Table 5.6: Map between file extensions and file types for the ExportType argument. (Continued)

Extension ExportType Setting File Format

.wmf WMF Windows MetaFile

.wpg WPG Word Perfect

Specifying the The QFactor argument is a number that determines the degree of loss
QFactor in the compression process when saving an image file to the following
ExportType formats: "CMP", "JPG", "JPG YUV4", "JPG YUV2", "JPG
Argument YUV1", "TIF JPG", "TIF JPG YUV4", "TIF JPG YUV2", "TIF JPG YUV1",
and "EXIF JPG". The valid range is from 2 to 255, with 2 resulting in
perfect quality and 255 resulting in maximum compression. The
default value is 2.

Note

The effect of this argument is identical to the “quality” parameter (0-100%) used in most
applications that view and convert JPEG graphics.

Specifying the Valid options for each format are listed in Table 5.7. The default is to
ColorBits use the maximum value supported by the requested format. This
argument is ignored for the following ExportType formats: "EMF",
Argument "EPS", "EPS TIFF", "EPS WMF", and "WMF".

Table 5.7: Valid options for the ColorBits argument.

ExportType Setting Format Description ColorBits Setting

JPEG and LEAD Compressed


"CMP" LEAD Compression Format 8, 24
"JPG" or "JPG YUV4" JPEG File Interchange Format with YUV 8, 24
4:4:4 color space
"JPG YUV2" JPEG File Interchange Format with YUV 8, 24
4:2:2 color space
"JPG YUV1" JPEG File Interchange Format with YUV 8, 24
4:1:1 color space

199
Chapter 5 Importing and Exporting

Table 5.7: Valid options for the ColorBits argument. (Continued)

ExportType Setting Format Description ColorBits Setting

Compressed TIFF
"TIF JPG" or "TIF JPG YUV4" Tagged Image File with JPEG compression 8, 24
and YUV 4:4:4 color space
"TIF JPG YUV2" Tagged Image File with JPEG compression 8, 24
and YUV 4:2:2 color space
"TIF JPG YUV1" Tagged Image File with JPEG compression 8, 24
and YUV 4:1:1 color space
"TIF PACK" Tagged Image File with PackBits 1, 2, 3, 4, 5, 6, 7, 8,
compression and RGB color space 16, 24, 32
"TIF PACK CMYK" Tagged Image File with PackBits 24, 32
compression and CMYK color space
"TIF PACK YCC" Tagged Image File with PackBits 24
compression and YCbCr color space
"CCITT" TIFF, compressed using CCITT
"CCITT G3 1D" TIFF, compressed using CCITT, group 3, 1
dimension
"CCITT G3 2D" TIFF, compressed using CCITT, group 3, 2
dimensions
"CCITT G4" TIFF, compressed using CCITT, group 4

TIFF Without Compression


"TIF" Tagged Image File Format, with no 1, 2, 3, 4, 5, 6, 7, 8,
compression and with RGB color space 16, 24, 32
"TIF CMYK" Tagged Image File Format, with no 24, 32
compression and with CMYK color space
"TIF YCC" Tagged Image File Format, with no 24
compression and with YCbCr color space

BMP Formats
"BMP Windows BMP, with no compression 1, 4, 8, 16, 24, 32
"BMP RLE" Windows BMP, with RLE compression 4, 8
"OS2" OS/2 BMP version 1.x 1, 4, 8, 24
"OS2 2" OS/2 BMP version 2.x 1, 4, 8, 24

Exif Formats
"EXIF" Exif file containing a TIFF image, no 24
compression with RGB color space
"EXIF YCC" Exif file containing a TIFF image, no 24
compression with YCbCr color space
"EXIF JPG" Exif file containing a JPEG compressed 24
image
"EXIF 411" Exif 2.0 file containing a JPEG 24
compressed image

200
Exporting Graphs

Table 5.7: Valid options for the ColorBits argument. (Continued)

ExportType Setting Format Description ColorBits Setting

Other Color Formats


"PCX" ZSoft PCX 1, 4, 8, 24
"WMF" Windows MetaFile 24
"EMF" Windows Enhanced MetaFile 24
"PSD" Adobe Photoshop 3.0 1, 8, 24
"PNG" Portable Network Graphics 1, 4, 8, 24
"TGA" TrueVision TARGA 8, 16, 24, 32
"EPS" Encapsulated PostScript 24
"EPS TIFF" Encapsulated PostScript with TIFF header 24
"EPS WMF" Encapsulated PostScript with WMF header 24
"RAS" Sun Raster 1, 4, 8, 24, 32
"WPG" Word Perfect (raster only) 1, 4, 8
"PCT" MacPict 1, 4, 8, 24

Formats requiring LZW


compression to be enabled
"TIF LZW" Tagged Image File Format with LZW 1, 2, 3, 4, 5, 6, 7, 8,
compression and RGB color space 16, 24, 32
"TIF LZW CMYK" Tagged Image File Format with LZW 24, 32
compression and RGB color space
"TIF LZW YCC" Tagged Image File Format with LZW 24
compression and RGB color space
"GIF" CompuServe GIF 1, 2, 3, 4, 5, 6, 7, 8

1-Bit FAX Formats


"FAX" or "FAX G3 1D" Raw FAX, compressed using CCITT group 3, 1
1 dimension
"FAX G3 2D" Raw FAX, compressed using CCITT group 3,
2 dimensions
"FAX G4" Raw FAX, compressed using CCITT group 4
"WFX" or "WFX G3" Winfax, compressed using CCITT group 3,
1 dimension
"WFX G4" Winfax, compressed using CCITT group 4
"ICA" or "ICA G3 1D" IOCA, compressed using CCITT group 3, 1
dimension
"ICA G3 2D" IOCA, compressed using CCITT group 3, 2
dimensions
"ICA G4" IOCA, compressed using CCITT group 4
"ICA RAW" or "ICA RAW G3 1D" IOCA, compressed using CCITT group 3, 1
dimension, without the MO:DCA wrapper
"ICA RAW G3 2D" IOCA, compressed using CCITT group 3, 2
dimensions, without the MO:DCA wrapper
"ICA RAW G4" IOCA, compressed using CCITT group 4,
without the MO:DCA wrapper
"CAL" CALS Raster file

Other 1-Bit Formats


"MAC" MacPaint 1
"MSP" Microsoft Paint
"IMG" GEM Image

201
Chapter 5 Importing and Exporting

CREATING HTML OUTPUT


S-PLUS provides a variety of tools for generating HTML output. In
this section, we discuss how to generate HTML tables, save
preformatted text output, and save graphs with HTML references.

Tables The html.table function may be used to generate a vector of


character strings representing a vector, matrix, or data frame as an
HTML table. The vector will contain one string for each line of
HTML. This may be written to a file by specifying the file argument
or may be manipulated and later written to a file using the write
function.
For example, we can create a file catalyst.htm containing the
catalyst data frame using:

> html.table(catalyst, file="catalyst.htm")

In addition to accepting a vector, matrix, or data frame, the


html.table function will accept a simple list with such structures as
components of the list. It will then produce a sequence of tables with
the list component names encoded as table captions. For example:

> my.results<-list("Regression Coefficients" =


+ coef(lm(Mileage~Weight, fuel.frame)),
+ "Correlations"=cor(fuel.frame[,1:3]))
> html.table(my.results, file="my.htm")

The html.table function accepts any of the arguments to format,


allowing specification of formatting details such as the number of
digits displayed. In addition, append controls whether output is
appended to the specified file or the file is overwritten. The append
argument is also available in the write function, which is useful for
interspersing html.table output and descriptive text:

> write("<H3> S-PLUS Code for the above </H3>


Continue string: <P> Put code here </P>",
+ file="my.htm", append=T)

Additional arguments to html.table are described in the function’s


help file.

202
Creating HTML Output

Note that html.table is designed to work with the previously


mentioned data structures. For other structures such as functions,
calls, and objects with specific print methods, the results of
html.table may not be satisfactory. Instead, the object may be
printed as preformatted text and embedded in the HTML page.

Text The sink function may be used to direct S-PLUS text output to an
HTML file. The preformatted output may be interspersed with the
HTML markup tag <PRE> to denote that it is preformatted output.
Additional textual description and HTML markup tags may be
interspersed with the S-PLUS output using cat.

> sink("my.htm")
> cat("<H3> Linear Model Results </H3> \n")
> cat("<PRE>")
> summary(lm(Mileage~Weight, fuel.frame))
> cat("</PRE>")
> sink()

The paste and deparse functions are useful for constructing strings to
display with cat. See their help files for details.

203
Chapter 5 Importing and Exporting

204
DEBUGGING YOUR
FUNCTIONS

Introduction
6
206
Basic S-PLUS Debugging 207
Printing Intermediate Results 208
Using recover 210
Interactive Debugging 212
Starting the Inspector 213
Examining Variables 214
Controlling Evaluation 218
Entering, Marking, and Tracking Functions 220
Entering Functions 221
Marking Functions 221
Marking the Current Expression 222
Viewing and Removing Marks 223
Tracking Functions 224
Modifying the Evaluation Frame 226
Error Actions in the Inspector 228
Other Debugging Tools 231
Using the S-PLUS Browser Function 231
Using the S-PLUS Debugger 232
Tracing Function Evaluation 233

205
Chapter 6 Debugging Your Functions

INTRODUCTION
Debugging your functions generally takes much longer than writing
them because relatively few functions work exactly as you want them
to the first time you use them. You can (and should) design large
functions before writing a line of code, but because of the interactive
nature of S-PLUS, it is often more efficient to simply type in a smaller
function, then test it and see what improvements it might need.
S-PLUS provides several built-in tools for debugging your functions.
In general, these tools make use of the techniques described in
Chapter 4, Writing Functions in S-PLUS, to provide you with as much
information as possible about the state of the evaluation.
In this chapter, we describe several techniques for debugging S-PLUS
functions using these built-in tools as well as the techniques of
Chapter 19, Computing on the Language, to extend these tools even
further. For a discussion of debugging loaded code, see Chapter 15,
Interfacing With C and Fortran Code. Refer also to Chapter 20, Data
Management, for a detailed discussion of frames.

206
Basic S-PLUS Debugging

BASIC S-PLUS DEBUGGING


When an error occurs in an S-PLUS expression, S-PLUS generally
returns an error message and the word Dumped:

> acf(corn.rain,type="normal")
Problem in switch(itype + 1,: desired type of ACF is
unknown
Use traceback() to see the call stack
Dumped

With existing functions such as acf, most errors occur because of


incorrectly specified arguments, such as nonexistent (or currently
unattached) data objects, invalid choices of values (as in our choice of
"normal" in the call to acf ), or omitted required arguments. When
you encounter a problem with a built-in function, then, your first
debugging tool is probably the function’s help file. Use the help file to
be sure you have the correct calling syntax and have supplied the
correct arguments.
Similarly, when you encounter a problem in a function you have
newly written, the first debugging tool is the function’s definition.
Looking at the definition carefully can often reveal a variety of
problems:
• Misused functions. If your function definition includes calls to
unfamiliar functions, check the help files to be sure you are
using those functions correctly.
• Uninitialized variables (often the culprit in messages such as
Cannot find object "object"). Look for these particularly in
looping constructions, because loops frequently contain
assignments such as a[i] <- value. If a is initially empty you
may well have forgotten to create it.
• Inadequate input filtering. You may have intended to allow
vectors, matrices, and lists as input, but neglected to put in the
code required to differentiate among the various cases.
Similarly, you may have neglected to include if and stop
statements to explicitly exclude certain cases.

207
Chapter 6 Debugging Your Functions

• Environmental dependencies. Many functions implicitly use


various settings of the S-PLUS environment. For example,
graphics functions require active graphics devices and
recursive functions often require deeper nesting than the
default value of options("expression").
A useful aid in examining your function is the traceback function,
which lists the nested function calls currently being evaluated, starting
with the function from which the error was returned and working
outward to the original calling function. For the example above,
traceback gives the following information:

> traceback()
6: eval(action, sys.parent())
5: doErrorAction("Problem in switch(itype + 1,: desired
type of ACF is unknown",
4: stop("desired type of ACF is unknown")
3: acf(corn.rain, type = "normal")
2: eval(expression(acf(corn.rain, type = "normal")))
1:
Message: Problem in switch(itype + 1,: desired type of
ACF is unknown

Using traceback is a good way to focus your initial examination. You


should get in the habit of typing traceback() whenever a function
call returns an error and the Dumped message.

Printing One of the oldest techniques for debugging, and still widely used, is to
Intermediate print intermediate results of computations directly to the screen. By
examining intermediate results in this way, you can see if correct
Results values are used as arguments to functions called within the top-level
function.
This can be particularly useful when, for example, you are using
paste to construct a set of elements. Suppose that you have written a
function to make some data sets, with names of the form datan, where
each data set contains some random numbers:

make.data.sets <-
function(n) {
names <- paste("data", 1:n)
for (i in 1:n)
{

208
Basic S-PLUS Debugging

assign(names[i], runif(100), where = 1)


}
}

After writing this function, you try it:

> make.data.sets(5)

S-PLUS reports no errors, so you look for your newly created data set,
data4:

> data4
Error: Object "data4" not found

To find out what names the function actually was creating, put a cat
statement into make.data.sets after assigning names:

> make.data.sets
function(n)
{
names <- paste("data", 1:n)
cat(names, "\n ")
for(i in 1:n)
{ assign(names[i], runif(100), where = 1)
}
}
> make.data.sets(5)
data 1 data 2 data 3 data 4 data 5

The cat function prints the output in the simplest form possible; you
can get more usual-looking S-PLUS output by using print or show
instead (the show function was introduced in S-PLUS 5.0 as a more
object-oriented version of print):

> make.data.sets
function(n)
{
names <- paste("data", 1:n)
print(names)
for(i in 1:n)
{ assign(names[i], runif(100), where = 1)
}
}
> make.data.sets(5)
[1] "data 1" "data 2" "data 3" "data 4" "data 5"

209
Chapter 6 Debugging Your Functions

The form of these names is not quite what we wanted, so we look at


the paste help file, and discover that we need to specify the sep
argument as "". We fix make.data.sets, but retain the call to print as
a check:

> make.data.sets
function(n)
{ names <- paste("data", 1:n, sep = "")
print(names)
for(i in 1:n)
{ assign(names[i], runif(100), where = 1)
}
}
> make.data.sets(5)
"data1" "data2" "data3" "data4" "data5"
> data4
[1] 0.784289481 0.138882026 0.656852996 0.443559750
[5] 0.651548887 . . .

Now that make.data.sets works as we’d hoped it would, we can


remove the print statement. (Of course, if you’d always like to see the
exact names of the data sets created, you might want to leave it in.)

Using recover The recover function can be used to provide interactive debugging as
an error action. To use recover, set your error action as follows:

options(error=expression(if(interactive())
recover() else dump.calls()))

Then, for those type of errors which would normally result in the
message “Problem in ... Dumped,” you are instead asked “Debug? Y/
N”; if you answer “Y”, you are put into recover’s interactive debugger,
with a R> prompt. Type ? at the R> prompt to see the available
commands. Use up to move up the frame list, down to move down the
list. As you move to each frame, recover provides you with a list of
local variables. Just type the local variable name to see its current
value. For example, here is a brief session that follows a faulty call to
the sqrt function:

> sqrt(exp)

Problem in x^0.5: needed atomic data, got an object of class


"function"

210
Basic S-PLUS Debugging

Debug ? ( y|n ): y
Browsing in frame of x^0.5
Local Variables: .Generic, .Signature, e1, e2

R> ?
Type any expression. Special commands:
`up', `down' for navigation between frames.
`where' # where are we in the function calls?
`dump' # dump frames, end this task
`q' # end this task, no dump
`go' # retry the expression, with corrections made
Browsing in frame of x^0.5
Local Variables: .Generic, .Signature, e1, e2
R> up
Browsing in frame of sqrt(exp)
Local Variables: x
R(sqrt)> x
function(x)
.Internal(exp(x), "do_math", T, 108)
R(sqrt)> x<-exp(1)
R(sqrt)> go
[1] 1.648721

In the example session, we accidentally gave a function as the


argument to sqrt, rather than the needed atomic data object. Inside
recover, we move up to sqrt’s frame, change the argument x to the
result of a function call, then use recover’s go command to complete
the expression.

211
Chapter 6 Debugging Your Functions

INTERACTIVE DEBUGGING
Although print, show, and cat statements can help you find many
bugs, they aren’t a particularly efficient way to debug functions,
because you need to make your modifications in a text editor, run the
function, examine the output, then return to the text editor to make
further modifications. If you are examining a large number of
assignments, the simple act of adding the print statements can
become wearisome.
Using recover provides interactive debugging, but it has no real
debugging facilities—the ability to step through code a line at a time,
set breakpoints, track functions, and so on.
With the interactive debugging function inspect you can follow the
evaluation of your function as closely as you want, from stepping
through the evaluation expression-by-expression to running the
function to completion, and almost any level of detail in between.
While inspecting you can do any of the following tasks:
• examine variables in the function’s evaluation frame. Thus,
print and cat statements are unnecessary. You can also look
at function definitions.
• track functions called by the current function. You can request
that a message be printed on entry or exit, and that your own
expressions be installed at those locations.
• mark the current expression. If the marked expression occurs
again during the inspection session, evaluation halts at that
point. Functions can be marked as well; evaluation will halt at
the top of a marked function whenever it is called. Marking an
expression or function corresponds to setting a breakpoint.
• enter a function; this allows you to step through a single
function call, without stopping in subsequent calls to the same
function.
• examine the current expression, together with the current
calling stack. The calling stack lets you know how deeply
nested the current expression is, and how you got there.

212
Interactive Debugging

• step through n expressions or subexpressions. By default, the


inspector automatically stops before each new expression or
function call. You can also do groups of expressions, such as a
braced set of expressions, or a complete conditional
expression.
• evaluate arbitrary S-PLUS expressions. These expressions are
evaluated in the local evaluation frame, so, for example, you
can assign new values to objects in the local frame. In many
cases, this lets you experiment with fixes to your code during
the evaluation.
• keep track of expressions and functions that are marked or
tracked, as well as expressions scheduled for evaluation on
exit. You can also monitor the current function’s return value.
• complete evaluation of the current loop or function, or resume
evaluation, stopping only for marked functions or
expressions.
• look at objects and evaluate expressions in any frame.
The following subsections describe these tasks in detail, and show
how to perform them within inspect.

Starting the To start a session with the inspector, call inspect with a specific
Inspector function call as an argument. For example, the call to make.data.sets
with n=5 resulted in a problem, so we can try to track it down by
starting inspect as follows:

> inspect(make.data.sets(5))
entering function make.data.sets
stopped in make.data.sets (frame 3), at:
names <- paste("data", 1:n)
d>

For simplicity, we call the function appearing in the argument to


inspect as the function being inspected. The d> prompt indicates that
you are in the inspector environment. The inspector environment has
a limited instruction set; the instructions are shown in Table 6.1. If you
type anything at the inspector prompt other than those instructions,
you get a syntax error message.

213
Chapter 6 Debugging Your Functions

Inspector instructions are not S-PLUS function calls; do not use


parentheses when issuing them. Use the help instruction to see a list
of instructions; type help instruction for help on a particular
instruction.
To leave the inspector and return to the S-PLUS prompt, use the
instruction quit.

Examining You can obtain a listing of objects in the current evaluation frame with
Variables the inspector instruction objects. For example, in our call to
make.data.frames, we obtain the following listing from objects:

d> objects
[1] ".Auto.print" ".entered." ".name." "n"

To examine the contents of these objects, use the inspector instruction


eval followed by the object’s name:

d> eval n
[1] 5

To examine a function definition, rather than a data variable, use the


instruction fundef:

d> fundef make.data.sets

make.data.sets
function(n)
{ names <- paste("data", 1:n)
{ for(i in 1:n)
{ assign(names[i], runif(100), where = 1 )
}
}
}

When you use eval or fundef to look at S-PLUS objects, you can in
general just type the name of the object after the instruction, as in the
examples above. Names in S-PLUS that correspond to the inspect
function’s keywords must be quoted when used as names. Thus, if
you want to look at the definition of the objects function, you must
quote the name "objects", because objects is an inspect keyword.
For a complete description of the quoting rules, type help name within
an inspection session. For a complete list of the keywords, type help
keywords.

214
Interactive Debugging

One important question that arises in the search for bugs is “Which
version of that variable is being used here?” You can answer that
question using the find instruction. For example, consider the
examples fcn.C and fcn.D given in Matching Names and Values on
page 903. We can use find inside the inspector to demonstrate that
the value of x used by fcn.D is not the value defined in fcn.C:

> inspect(fcn.C())

entering function fcn.C


stopped in fcn.C (frame.3), at:
x <- 3

d> track fcn.D

entry and exit tracking enabled for fcn.D

d> mark fcn.D

entry mark set for fcn.D


exit mark(s) set for fcn.D ( some or all were already set )

d> resume

entering function fcn.D


call was: fcn.D() from fcn.C (frame 3)
stopped in fcn.D (frame 4), at:
return(x^2)

d> objects

[1] ".Auto.print" ".entered." ".name."

d> find x

.Data

See Entering, Marking, and Tracking Functions on page 220 for


complete details on using the track and mark instructions.
You can inspect the value of variables in different frames by using the
up or down instructions to change the frame in which objects looks for
objects and eval evaluates them. For example, we could find the
value 3 in fcn.C’s frame while in fcn.D as follows:

. . .

215
Chapter 6 Debugging Your Functions

stopped in fcn.D , at:


return(x^2)

d> objects

[1] ".Auto.print" ".entered." ".name."

d> up

fcn.C (frame 3)

d> objects

[1] ".Auto.print" ".entered." ".name." "x"

d> eval x

[1]

Table 6.1: Instructions for the interactive inspector.

Keyword Help given

help [ instruction | Provides help on instruction, names, or keywords. With no


names | keywords ] arguments, help gives a summary of the available instructions.

complete [loop | Evaluates to the end of the next for/while/repeat loop, or to the point
function] of function return.

debug.options [echo = With echo=T, expressions are printed before they are evaluated. With
T|F] [marks = marks=hard, evaluation always halts at a marked expression. With
hard|soft] marks=soft it halts only during a resume. Setting marks=soft is
a way of temporarily hiding marks for do, complete, etc. The
defaults are: echo=F, marks=hard. With no arguments,
debug.options displays the current settings.

do [n] Evaluates the next n expressions which are at the same level as the
current one. The default is 1. Thus if evaluation is stopped directly
ahead of a braced group, do does the entire group.

down [n] Changes the local frame for instructions such as objects and eval
to be n frames deeper than the current one. The default is 1. After any
movement of the evaluator (step, resume, etc.), the local frame at
the next stop is that of the function stopped in.

enter Enters the function called in the next expression.

216
Interactive Debugging

Table 6.1: Instructions for the interactive inspector.

Keyword Help given

eval expr Evaluates the S-PLUS expression expr.

find name Reports where name would be found by the evaluator.

fundef [name] Prints the original function definition for name. Default is the current
function. Tracked and marked functions will have modified function
definitions temporarily installed; fundef is used to view the original.
The modified and original versions will behave the same; the
modified copy just incorporates tracing code.

mark Remembers the current expression; evaluation will halt here from
now on.

mark name1 [name2 ...] Arranges to stop in the named functions. The default is to stop at
[at entry|exit] both entry and exit.

objects Names of objects in this function’s frame.

on.exit Displays the current on-exit expressions for this function.

quit Abandons evaluation, return to top-level prompt.

resume Resumes evaluation.

return.value Displays the return value, if known.

show [tracks | marks | Displays installed tracks and marks. Default all.
all]

step [n] Evaluates the next n expressions. Default 1.

track name1/ [name2/ Enables or modifies entry and/or exit tracking for the named
... ] [at entry|exit] functions. The default for print is T. You can use any S-PLUS
[print = T|F] [with expression as expr.
expr]

unmark name1/ [name2 Deletes mark points at the named locations in the named functions.
...] [at entry|exit]

217
Chapter 6 Debugging Your Functions

Table 6.1: Instructions for the interactive inspector.

Keyword Help given

unmark n1 [n2 ...] Deletes mark points n1, n2, .... See mark and show.

unmark all Deletes all mark points.

untrack name1/ [name2/ Disables tracking for the named functions.


... ]

up [n] Changes the local frame for instructions such as objects and eval
to be n frames higher than the current one. The default is 1. After any
movement of the evaluator (step, resume, etc.), the local frame at
the next stop is that of the function stopped in.

where Displays stack of function calls, and current expression in current


function.

Controlling Within the inspector, you can control the granularity at which
Evaluation expressions are evaluated. For the finest control, use the step
instruction, which by default, evaluates the next expression or
subexpression. The inspector automatically determines stopping
points before each expression. Issuing the step instruction once takes
you to the next stopping point. To clarify these concepts, consider
again our call to make.data.sets. You can see the current position
using the where instruction:

d> where

Frame numbers and calls:

4: debug.tracer(what = TR.GENERIC, index = c(2, 1)) from 3


3: make.data.sets(5) from 1
2: inspect(make.data.sets(5)) from 1
1: from 1
--------------------
stopped in make.data.sets (frame 3), at:
names <- paste("data", 1:n

The numbered lines in the output from where represent the call stack;
they outline the frame hierarchy. The position is shown by the lines

218
Interactive Debugging

stopped in make.data.sets (frame 3), at:


names <- paste("data", 1:n

If we issue the step instruction, we move to the next stopping point,


which is right before the function call to paste:

d> step

stopped in make.data.sets (frame 3) , at:


paste("data", 1:n)

Another step instruction completes the evaluation of the call to


paste, and takes us to the beginning of the next expression:

d> step

stopped in make.data.sets (frame 3), at:


return(for(i in 1:n)
{ assign(names[i], runif(100), where = 1 )
}
...

You can step over several stopping points by typing an integer after
the step instruction. For example, you could step over the complete
expression
names <- paste("data", 1:n) with the instruction step 2.

You should distinguish between these automatically determined


stopping points and breakpoints, which you insert using the mark
instruction. Breakpoints allow you to stop evaluation at particular
expressions or functions, and either step through from that point or
resume evaluation until the next breakpoint is encountered.
Breakpoints and marks are discussed in detail in Entering, Marking,
and Tracking Functions on page 220. Another way to execute a
complete expression is to use the do instruction. The do instruction
has the advantage that you do not need to know how many stopping
points the expression contains; do evaluates the entire current
expression. For example, you can do the following complete
expression with a single do instruction:

return(for(i in 1:n)
{ assign(names[i], runif(100), where = 1 )
}
...

219
Chapter 6 Debugging Your Functions

The do instruction is particularly helpful when, as in this example, the


current expression includes a loop or conditional expression. Using
step causes the loop or conditional to be entered, and each
subexpression evaluated in turn. Using do evaluates the entire
expression atomically.
To evaluate larger pieces of the function, use the complete and resume
instructions. Use complete to complete the current loop, if within a
loop, or, if not, complete the current function. You can specify
complete loop or complete function to override the default
behavior. Thus, if you are within a for loop and type complete
function, evaluation proceeds to the end of the current function. The
inspector stops at the point after the function’s last expression, before
the on-exit expressions are executed. You can look at the return value
and the on-exit expressions before exiting. Use the instruction
return.value to see the return value; use the instruction on.exit to
see the on-exit expressions.
Use resume to resume evaluation and proceed to the next breakpoint.
If there are no further breakpoints, resume completes the call given to
the inspector. Evaluation always stops at a breakpoint, unless you use
the debug.options instruction to set marks=soft. If you specify the
marks as “soft,” the do, step and complete instructions ignore
breakpoints, while resume stops at them as usual.

Entering, By default, inspect lets you step through the expressions in the
Marking, and function being inspected. Function calls within the function begin
debugged are evaluated atomically. However, you can extend the
Tracking step-through capability to such functions using the enter and mark
Functions instructions. You can also monitor calls to a function, without stepping
through them, with the track instruction.

Limitations on marking and tracking

You cannot enter, mark, or track functions that are defined completely by a call to .Internal.
Also, for technical reasons, you cannot enter, mark, or track any of the seven functions listed
below:

assign invisible assign.default on.exit exists remove exists.default

220
Interactive Debugging

Entering If you want to step through a function in the current expression, and
Functions don’t plan to step through it if it is called again, use the enter
instruction. For example, while inspecting the call lm(stack.loss
stack.x), you might want to step through the function
model.extract. After stepping to the call to model.extract, you issue
the enter instruction:

d> step

stopped in lm (frame 3), at:


model.extract(m, weights)

d> enter

entering function model.extract


stopped in model.extract (frame 4), at:
what <- substitute(component)

Marking To stop in a function each time it is called, use the mark instruction.
Functions For example, the ar.burg function makes several calls to array. If we
want to stop in array while inspecting ar.burg, we issue the mark
instruction and type the name of the function to be marked. By
default, a breakpoint is inserted at the beginning and end of the
function:

d> mark array

entry mark set for array exit mark(s) set for array

By default, each time the evaluator encounters a marked function, it


stops once just after entering the function, and once just before
exiting. If you want to stop only at entry or only at exit, you can use
the optional at parameter to specify entry or exit as the desired
breakpoint. For example, to stop each time array is entered, use mark
as follows:

d> mark array at entry

entry mark set for array

To stop at the end of function evaluation for a function marked at


entry, use complete function to complete the function evaluation:

. . .

221
Chapter 6 Debugging Your Functions

d> where

Frame numbers and calls:

5: debug.tracer(what = TR.GENERIC, index = c(4, 1)) from 4


4: array(0, dim = c(nser, nser, order.max + 1)) from 3
3: ar.burg(lynx) from 1
2: inspect(ar.burg(lynx)) from 1
1: from 1
---------------------
stopped in array (frame 4), at:
data <- as.vector(data)

d> complete function stopped in array (frame 4), at end;

return value from: return(data) d>

To continue evaluation of the function being inspected, use resume:

d> resume

entering function array


stopped in array (frame 4), at:
data <- as.vector(data)

Marking the You can mark the current expression by giving the mark instruction
Current with no arguments. This sets a breakpoint at the current expression.
This can be useful, for example, if you are inspecting a function with
Expression an extensive loop inside it. If you want to stop at some expression in
the loop each time the loop is evaluated, you can mark the expression.
For example, consider again the bitstring function, defined in
Chapter 4, Writing Functions in S-PLUS. To check the value of n in
each iteration, you could use mark and eval together as follows. First,
start the inspection by calling bitstring, then step to the first
occurrence of the expression i <- i + 1. Issue the mark instruction,
use eval to look at n, then use resume to resume evaluation of the
loop. Each time the breakpoint is reached, evaluation stops. You can
then use eval to check n again:

> inspect(bitstring(107))

entering function bitstring


stopped in bitstring (frame 3), at:
string <- numeric(32)

222
Interactive Debugging

d>

. . .

d> step

stopped in bitstring (frame 3), at:


i <- i + 1

d> mark
d> eval n

[1] 53

d> resume

stopped in bitstring (frame 3), at:


i <- i + 1

Viewing and Once you mark an expression, evaluation always stops at that
Removing expression, until you unmark it. The inspector maintains a list of
marks, which you can view with the show instruction:
Marks
d> show marks
Marks: 1
: in array:
data <- as.vector(data)
2 : in aperm:
return(UseMethod("aperm"))

You can remove items from the list using the unmark instruction. With
no arguments, unmark unmarks the current expression. If the current
expression is not marked, you get a warning message. With one or
more integer arguments, unmark unmarks the expressions associated
with the given numbers:

d> show marks

Marks: 1
: in array:
data <- as.vector(data)
2 : in aperm:
return(UseMethod("aperm"))

223
Chapter 6 Debugging Your Functions

d> unmark 2

With one or more name arguments, unmark unmarks the named


functions:

d> unmark array

entry mark unset for array

The instruction unmark all unmarks all expressions.

Tracking If you want to monitor the evaluation of a certain function, without


Functions stopping inside the function, use the track instruction to track the
function. By default, a tracked function prints a message when it starts
and just before it completes. As with marked functions, however, you
can use the at parameter to specify entry or exit. You can perform
more sophisticated tracking by specifying an arbitrary S-PLUS
expression using the with parameter. For example, suppose you
simply want to monitor calls to array inside ar.burg, and view the
value returned by each call to array. You could do this by calling
track as follows:

> inspect(ar.burg(lynx))

entering function ar.burg stopped


in ar.burg (frame 3), at:
if(is.factor(x) || (is.data.frame(x) && any(
sapply(x, "is.factor"))))
stop("cannot calculate the acf of factors"
...

d> track array at exit with cat("array returning",


.ret.val., "\n ")
d> exit tracking enabled for array
d> resume

array returning 269 321 585 . . .


leaving function array
array returning 0 0 0 . . .
leaving function array
array returning 0 0 0 . . .
leaving function array
array returning 0
leaving function array

224
Interactive Debugging

array returning 1.0877 -0.597623 0.251923


. . .
leaving function array
array returning 0 0 0 . . .
leaving function array
leaving function ar.burg . . .

The value .ret.val. is one of a number of values stored internally by


inspect; these are named with leading periods (most have trailing
periods, as well) to avoid conflicts with your own objects and standard
S-PLUS objects. You can track a function giving different actions for
entry and exit; this can be useful, for example, if you want to calculate
the elapsed time of evaluation. To do so, you could define a function
func.entry.time as follows:

func.entry.time <-
function(fun)
{
assign("StartTime", proc.time(), frame=1)
cat(deparse(substitute(fun)), "entered at time",
get("StartTime", frame=1), "\n ")
}

Then define the exit function, func.exit.time as follows:

func.exit.time <-
function(fun)
{
assign("StopTime", proc.time(), frame=1)
assign("ElTime", get("StopTime", frame=1) -
get("StartTime", frame=1), frame=1)
cat(deparse(substitute(fun)), "took time",
get("ElTime", frame=1), "\n ")
}

You can then track a function at entry with func.entry.time and


track at exit with func.exit.time:

> inspect(ar.burg(lynx))

entering function ar.burg


stopped in ar.burg (frame 3), at:
if(is.factor(x) || (is.data.frame(x) && any( sapply(x,
"is.factor"))))

225
Chapter 6 Debugging Your Functions

stop("cannot calculate the acf of factors" ...

d> track array at entry with func.entry.time(array)

entry tracking enabled for array

d> track array at exit with func.exit.time(array)

exit expression for array changed to:


func.exit.time(array)

d> resume

entering function array


array entered at time 58.5 26.85 8303 2.64 13.14
array took time 0.5 0 1 0 0
entering function array
array entered at time 60.59 26.86 8306 2.64 13.14
array took time 0.599998 0.0100002 1 0 0
entering function array
. . .

You can suppress the automatic messages entering function fun and
leaving function fun by issuing the track instruction with the flag
print=F. For example, in our previous example, our initial call to
track specified tracking on entry, so only the entry message was
printed. To suppress that message, simply add the flag print=F after
the specification of entry or exit:

d> track array at entry print=F with func.entry.time(array)

Modifying the We have already seen one use of the eval instruction, to examine the
Evaluation objects in the current evaluation frame. More generally, you can use
eval to evaluate any S-PLUS expression. In particular, you can modify
Frame values in the current evaluation frame, with those values then being
used in the subsequent evaluation of the function being debugged.
Thus, if you discover where your error occurs, you can modify the
offending expression, evaluate it, and assign the appropriate value in
the current frame. If the fix works, the complete evaluation should
give the correct results. Of course, you still need to make the change
(with the fix function) in the actual function. But using eval provides
a useful testing tool inside the inspector. For example, once we have

226
Interactive Debugging

identified the problem in make.data.sets as occurring in the call to


paste, we can go to the point at which the faulty names have been
created:

> inspect(make.data.sets(5))

entering function make.data.sets


stopped in make.data.sets (frame 3), at:
names <- paste("data", 1:n)

d> step 2

stopped in make.data.sets (frame 3), at:


return(for(i in 1:n)
{ assign(names[i], runif(100), where = 1 )
}
...

d> objects

[1] ".Auto.print" ".entered." ".name." "n"


[5] "names"

d> eval names

[1] "data 1" "data 2" "data 3" "data 4" "data 5"

Here we see that the names are not what we wanted. To test our
assumption that we need the sep="" argument, use eval as follows:

d> eval names <- paste("data", 1:n, sep="")


d> eval names

[1] "data1" "data2" "data3" "data4" "data5"

Our change has given the correct names; now resume evaluation and
see if the data sets are actually created:

d> resume

leaving function make.data.sets

> data1

[1] 0.94305062 0.61680487 0.15296083 0.25405207


[5] 0.81061184 . . .

227
Chapter 6 Debugging Your Functions

Error Actions When an error occurs in the function being inspected, inspect calls
in the the current error.action. By default, this action has three parts, as
follows:
Inspector
1. Produce a traceback of the sequence of function calls at the
time of the error.
2. Dump the frames existing at the time of the error.
3. Start a restricted version of inspect that allows you to
examine frames and evaluate expressions, but not proceed
with further evaluation of the function being inspected.
Thus, you can examine the evaluation frame and the objects within it
at the point the error occurred. You can use the up and down
instructions to change frames, and the objects, find, on.exit, and
return.value instructions to examine the contents of the frames. The
instructions eval, fundef, help, and quit are also available in the
restricted version of inspect. For example, consider the primes
function described in Chapter 4, Writing Functions in S-PLUS. We can
introduce an error by commenting out the line that defines the
variable smallp:

primes <-
function(n = 100)
{
n <- as.integer(abs(n))
if(n < 2)
return(integer(0))
p <- 2:n
# smallp <- integer(0)
#
# the sieve
repeat
{ i <- p[1]
smallp <- c(smallp, i)
p <- p[p %% i != 0]
if(i > sqrt(n))
break
}
c(smallp, p)
}

Now call inspect with a call to primes:

228
Interactive Debugging

> inspect(primes())

entering function primes


stopped in primes (frame 3), at:
n <- as.integer(abs(n))

d> do 2

stopped in primes (frame 3), ahead of loop:


repeat
{ i <- p[1]
smallp <- c(smallp, i)
...

d> do

Error in primes(): Object "smallp" not found


Calls at time of error:

4: error = function() from 3


3: primes() from 1
2: inspect(primes()) from 1
1: from 1

Dumping frames ...


Dumped

local frame (frame of error) is primes (frame 3)

A quick glance at the frame of primes() with objects shows that


smallp is indeed not defined. Use the quit instruction to end the
inspect session, then start it again. You can then use the eval
instruction to specify an initial value for smallp, and watch the
function complete successfully:

d> quit
> inspect(primes())

entering function primes


stopped in primes (frame 3), at:
n <- as.integer(abs(n))

d> do 2

stopped in primes (frame 3), ahead of loop:

229
Chapter 6 Debugging Your Functions

repeat {
i <- p[1]
smallp <- c(smallp, i)
...

d> eval smallp <- numeric(0)


d> resume

leaving function primes


[1] 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59
[18] 61 67 71 73 79 83 89 97

You can then edit the primes function to fix the error.

Limitations of Inspect

• Functions defined within other functions or in function calls or argument default


expressions cannot be tracked. They should work, though. Also, avoid assigning
functions on frame 1, especially if you want to track them.
• Complex expressions inside if, while, and other conditions are never tracked. If
you want to track them, assign them to a name outside the test condition and use
the name inside the condition.
• Do not use trace if you plan to use inspect. The trace function creates a
modified version of the function being traced, as does inspect. The
modifications may not be completely compatible.
• Do not try to edit functions (using S-PLUS functions such as fix) while running
inspect. In particular, do not edit functions that you are tracking, have marked,
or have entered and not yet exited.
• Avoid using inspect on functions involving calls to Recall.
• You will see some extra frames and objects in the inspection mode that are not
there in normal evaluation. These objects have names which are unlikely to
conflict with those of the functions being inspected.

230
Other Debugging Tools

OTHER DEBUGGING TOOLS


The inspect function provides a complete interactive debugging
environment, and we recommend it for all your normal S-PLUS
debugging needs. On occasion, however, you may find some of
S-PLUS’s other debugging tools of some use. This section briefly
describes these other tools—browser, debugger, and trace.

Using the The browser function is useful for debugging functions when you
S-PLUS Browser know an error occurs after some point in the function. If you insert a
call to browser into your function at that point, you can check all
Function assignments up to that point, and verify that they are indeed the
correct ones. For example, to return to our make.data.sets example,
we could have replaced our original cat statement with a call to
browser:

make.data.sets <-
function(n)
{
names <- paste("data", 1:n)
browser()
for(i in 1:n)
{ assign(names[i], runif(100), where = 1)
}
}

When we callmake.data.sets, we get a new prompt


b(make.data.sets)> to indicate we are in the browser, and a message
telling us which function the browser was called from:

> make.data.sets(5)
Called from: make.data.sets(5)
b(make.data.sets)>

Type ? at the prompt to get brief help on the browser, plus a listing of
the variables in the local frame:

b(make.data.sets)> ?
Type any expression. Special commands:
`up', `down' for navigation between frames.
`c' # exit from the browser & continue
`stop' # stop this whole task

231
Chapter 6 Debugging Your Functions

`where' # where are we in the function calls?


Browsing in frame of make.data.sets(5)
Local Variables: n, names
b(make.data.sets)> names
[1] "data 1" "data 2" "data 3" "data 4" "data 5"

To leave the browser, type either c or q at the prompt:

b(make.data.sets)> q
>

You can type arbitrary S-PLUS expressions at the browser prompt.


These expressions are evaluated in the chosen frame, which is
indicated by the function name within the prompt—thus,
b(make.data.sets)> indicates that you are in browser in the frame of
the function make.data.sets. Thus, you can type alternative
expressions and see if a possible fix will actually work.

Using the If a function is broken, so that it returns an error reliably when called,
S-PLUS there is an alternative to all those cat and browser statements: the
debugger function. To use debugger on a function, you must have the
Debugger function’s list of frames dumped to disk. You can do this in several
ways:
• Call dump.frames() from within the function.
• Call dump.frames() from the browser.
• Set options(error=expression(dump.frames())) If you use
this option, you should reset it to the default
(expression(dump.calls())) when you are finished
debugging, because dumped frames can be quite large.
Then, when an error occurs, you can call the debugger function with
no arguments, which in turn uses the browser function to let you
browse through the dumped frames of the broken function. Use the
usual browser commands (?, up, down, and frame numbers) to move
through the dumped frames.
For example, consider the following simple function:

debug.test <-
function()
{
x <- 1:10

232
Other Debugging Tools

sin(z)
}

This has an obvious error in the second line of the body, so it will fail
if run. To use debugger on this function, do the following:

> options(error=expression(dump.frames()))
> debug.test()
Problem in debug.test(): Object "z" not found
Evaluation frames saved in object "last.dump", use
debugger() to examine them
> debugger()
Message: Problem in debug.test(): Object "z" not found
browser: Frame 11
b(sin)>

You are now in the browser, and can view the information in the
dumped frames as described above.

Tracing Another way to use the browser function is with the trace function,
Function which modifies a specified function so that some tracing action is
taken whenever that function is called. You can specify that the action
Evaluation be to call the browser function (with the statement tracer = browser)
providing yet another way to track down bugs.

Warning: trace and inspect clash

Do not use trace on any function if you intend to do your debugging with inspect.

For example, suppose we wanted to trace our make.data.sets


function:

> trace(make.data.sets,browser)
> make.data.sets
function(n) {
if(.Traceon)
{ .Internal(assign(".Traceon", F, where = 0),
"S_put")
cat("On entry: ")
browser()
.Internal(assign(".Traceon", T, where = 0),
"S_put")

233
Chapter 6 Debugging Your Functions

} else
{ names <- paste("data", 1:n)
for(i in 1:n)
{ assign(names[i], runif(100), where = 1)
}
}
}

The trace function copies an edited version of the traced function


into the session frame, and maintains a list of all functions which are
currently being traced. Since S-PLUS finds objects in the session frame
before looking in directories, do not try to edit a function that is
currently being traced. If, for instance, you call fix(make.data.sets)
while make.data.sets is being traced, you overwrite the copy of
make.data.frames in your working directory with the edited version,
which contains several calls to .Internal. The additions include the
call to the tracer, in this case browser. The object .Traceon specifies
whether tracing is enabled; you can change this value with the
trace.on function.

If we now call make.data.sets, we find ourselves in the browser, in


the make.data.sets frame:

> make.data.sets(3)
On entry: Called from: make.data.sets(3)
b(2)> ?
1: n
b(2)>

However, trace, by default, puts the call to browser at the beginning of


the function, so that we actually see less information in the browser
than we hoped; in particular, we don’t see the value of names. We can,
however, run the expression to create the names:

b(2)> names <- paste("data", 1:n)


b(2)> names
[1] "data 1" "data 2" "data 3"

From this, we discover, as before, that our paste expression needs


modification, and as before we can test our proposed change before
implementing it. After leaving browser, type
untrace(make.data.sets) to remove the traced function from the
list.

234
Other Debugging Tools

If we had wanted the call to browser after the names assignment, we


could have used the at argument to trace:

> trace(make.data.sets,browser,at=2)
> make.data.sets
function(n) {
names <- paste("data", 1:n)
{ if(.Traceon)
{ .Internal(assign(".Traceon", F,
where = 0), "S_put")
cat("At 2: ")
browser()
.Internal(assign(".Traceon", T,
where = 0), "S_put")
}
for(i in 1:n)
{ assign(names[i], runif(100), where = 1)
}
}
}

Now if we call make.data.sets, our browser session looks much like


the one in the previous section:

> make.data.sets(3)
At 2: Called from: make.data.sets(3)
b(2)> ?
1: names
2: n
b(2)>

235
Chapter 6 Debugging Your Functions

236
EDITABLE GRAPHICS
COMMANDS

Introduction
7
239
Getting Started 241
Graphics Objects 244
Graph Sheets 244
Graphs 244
Axes 245
Plots 245
Annotations 245
Object Path Names 246
Graphics Commands 249
Plot Types and Plot Classes 249
Viewing Argument Lists and Online Help 252
Specifying Data 253
Display Properties 254
Displaying Dialogs 257
Plot Types 258
The Plots2D and ExtraPlots Palettes 258
The Plots3D Palette 272
Titles and Annotations 281
Titles 281
Legends 281
Other Annotations 282
Locating Positions on a Graph 285
Formatting Axes 287
Formatting Text 289
Modifying the Appearance of Text 290
Superscripts and Subscripts 291
Greek Text 291
Colors 292

237
Chapter 7 Editable Graphics Commands

Layouts for Multiple Plots 293


Combining Plots on a Graph 293
Multiple Graphs on a Single Page 293
Multiple Graphs on Multiple Pages 294
Conditioned Trellis Graphs 294
Specialized Graphs Using Your Own Computations 295

238
Introduction

INTRODUCTION
Chapter 3 through Chapter 6 in the User’s Guide introduces the
editable graphics system that is part of the S-PLUS graphical user
interface. As the chapters are part of the User’s Guide, they focus on
creating and customizing editable graphics via the point-and-click
approach. In this chapter, we show how to create and modify such
graphics by calling S-PLUS functions directly. All of the graphics
available in the Plots2D, Plots3D, and ExtraPlots palettes can be
generated by pointing and clicking, or by typing commands in the
Script and Commands windows. Likewise, editable graphs can be
modified by using the appropriate dialogs and the Graph toolbar, or
by calling functions that make the equivalent modifications.

Note

The graphics produced by the Statistics menus and dialogs are traditional graphics. See Chapter
8 and Chapter 9 in the Programmer’s Guide for details.

An editable graph contains numerous graph objects, such as axes and


annotations. Each field in a graphics dialog corresponds to a property
of an object. Similarly, toolbar actions such as changing plot colors or
line styles are changes in the values of an object’s properties. Each
time you create or modify a graph through the S-PLUS GUI, the
corresponding command is recorded in the History Log. This
provides an easy way for you to generate programming examples and
familiarize yourself with the editable graphics functions. The basic
procedure is:
1. Select Window  History  Clear to clear your History
Log.
2. Create a graph using the plot palettes and modify it with the
dialog and toolbar options.
3. Select Window  History  Display to view the commands
that created your graphic.

239
Chapter 7 Editable Graphics Commands

By default, S-PLUS writes a condensed History Log. You can also


record a full History Log by selecting Options  Undo & History
and changing the History Type to Full. The main differences
between the condensed and full History Log are:
• Calls to the guiPlot function are recorded in the condensed
History Log while calls to the guiCreate function are
recorded in the full History Log. We discuss guiPlot in the
section Getting Started on page 241 and we discuss both
functions in the section Graphics Commands on page 249.
• The guiPlot calls in the condensed History Log include only
those arguments that are different from their default values.
The guiCreate calls in the full History Log include all
arguments, even if they are not used explicitly to create a
particular plot.
• The condensed History Log records plotting commands
while the full History Log also records commands that
initialized graph sheets, open palettes, and close dialogs.
Rather than attempt to learn the language of S-PLUS editable graphics
from scratch, we encourage you to make extensive use of the History
Log for programming examples and templates. For more information
on the History Log, see Chapter 11 in the User’s Guide .
In the section Getting Started on page 241, we provide an overview of
the guiPlot function and show how it corresponds to particular GUI
actions. We then describe the types of objects that constitute an
editable graphic in S-PLUS. We provide examples that show how you
can create each type of editable plot programmatically, and then
show how you can modify different properties of the plots. Finally, we
illustrate how to place multiple plots on a single page, including how
to create multipanel Trellis graphics.
In this chapter, we assume that you are familiar with components of
the S-PLUS graphical user interface such as toolbars, toolbar buttons,
palettes, and dialogs. If you are not, you may wish to review Chapter
3 through Chapter 6 in the User’s Guide before proceeding.

240
Getting Started

GETTING STARTED
The guiPlot function emulates the action of interactively creating
plots by first selecting columns of data and then clicking on a button
in a plot palette. The colors, symbol types, and line styles used by
guiPlot are equivalent to those specified in both the Options 
Graph Styles dialogs and the individual graphics dialogs. The
arguments to guiPlot are:

> args(guiPlot)
function(PlotType = "Scatter", NumConditioningVars = 0,
Multipanel = "Auto", GraphSheet = "", AxisType = "Auto",
Projection = F, Page = 1, Graph = "New", Rows = "",
Columns = "", ...)

The PlotType argument is a character string that matches the name of


the plot button as displayed in its tool tip. To see the appropriate
string for a plot, hover your mouse over its button in one of the plot
palettes until the tool tip appears. Alternatively, use the
guiGetPlotClass function to see a list of all plot types that guiPlot
accepts:

> guiGetPlotClass()
[1] "Scatter" "Isolated Points"
[3] "Bubble" "Color"
[5] "Bubble Color" "Text as Symbols"
[7] "Line" "Line Scatter"
[9] "Y Series Lines" "XY Pair Lines"
[11] "Y Zero Density" "Horiz Density"
[13] . . .

The default value PlotType="Scatter" produces a simple scatter plot.


For example, the command
guiPlot("Scatter", DataSet = "fuel.frame",
Columns = "Mileage, Weight")

emulates the following actions:


1. Highlight the Mileage column in the fuel.frame data set.
CTRL-click to simultaneously highlight the Weight column.
2. Click the Scatter button in the Plots2D palette.
Either approach displays a scatter plot of Weight versus Mileage.

241
Chapter 7 Editable Graphics Commands

The AxisType argument to guiPlot allows you to define different axis


types exactly as you do from the Standard toolbar. It accepts a string
that matches the name of the axis type as it appears in the Default 2D
Axes Type list. For example, the following call creates a graph with a
Log Y axis:

guiPlot("Scatter", DataSet = "fuel.frame",


Columns = "Mileage, Weight", AxisType = "Log Y")

This command is equivalent to specifying Log Y axes in the


Standard toolbar before clicking the Scatter button in the Plots2D
palette.
Similarly, the following command creates a graph with two overlaid
plots: one showing Weight versus Mileage and the other showing
Disp. versus Mileage. The AxisType argument is set to "Multiple Y"
so that the y axis for the second plot appears along the right side of the
graph sheet, while the y axis for the first plot appears on the left.

guiPlot("Scatter", DataSet = "fuel.frame",


Columns = "Mileage, Weight, Disp.",
AxisType = "Multiple Y")

The following call places the plots in two separate panels that have
the same x axis scaling but different y axis scaling:

guiPlot("Scatter", DataSet = "fuel.frame",


Columns = "Mileage, Weight, Disp.",
AxisType = "Vary Y Panel")

The NumConditioningVars argument allows you to create Trellis


conditioning plots using guiPlot. For example, the command

guiPlot("Scatter", DataSet = "fuel.frame",


Columns = "Mileage, Weight, Type",
NumConditioningVars = 1)

emulates the following GUI actions:


1. Click the Set Conditioning Mode button in the Standard
toolbar.
2. Highlight the Mileage column in the fuel.frame data set.
CTRL-click to simultaneously highlight the Weight and Type
columns.
3. Click the Scatter button in the Plots2D palette.

242
Getting Started

Either approach creates a scatter plot of Weight versus Mileage for


each type of car. The last variable specified in the Columns argument
to guiPlot (or highlighted in the Data window) is always used as the
conditioning variable. We discuss the NumConditioningVars argument
more in the section Conditioned Trellis Graphs on page 294.
S-PLUS writes guiPlot commands to the condensed History Log
when you create a graph interactively. If the History Type is set to
Full instead of Condensed, S-PLUS writes guiCreate commands to
the History Log; we discuss guiCreate more in the section Graphics
Commands on page 249. You can write your own examples using
guiPlot by creating the desired plot type and then viewing the
condensed History Log.

243
Chapter 7 Editable Graphics Commands

GRAPHICS OBJECTS
There are five main types of graphics objects in the editable graphics
system: graph sheets, graphs, axes, plots, and annotations. Plots are
contained in graphs, and graphs are contained in graph sheets. Most
graphics objects cannot exist in isolation. If a graphics object is
created in isolation, it generates an appropriate container. For
example, when you create a plot, the appropriate graph, axes and
graph sheet are automatically configured and displayed.
In general, the simplest way to create plots is with guiPlot. You can
create all types of graphics objects with the guiCreate function. The
properties of graphics objects can be modified using the guiModify
function. In this section, we briefly describe each of the graphics
objects; the section Graphics Commands on page 249 discusses
guiPlot, guiCreate, and guiModify in more detail.

Graph Sheets Graph sheets are the highest-level graphics object. They are documents
that can be saved, opened, and exported to a wide variety of graphics
formats. Graph sheet properties determine the orientation and shape of
the graph sheet, the units on the axes, the default layout used when
new graphs are added, and any custom colors that are available for
other objects. Graph sheets typically contain one or more graphs in
addition to annotation objects such as text, line segments, arrows, and
extra symbols.

Graphs There are six types of graphs in the editable graphics system: 2D, 3D,
Matrix, Smith, Polar, and Text. The graph type determines the
coordinate system used within the graph:
• A 2D graph can have one or more two-dimensional
coordinate systems, each composed of an x and y axis.
• A 3D graph has a single three-dimensional coordinate system
defined by a 3D axes object.
• A Matrix graph has a set of two-dimensional coordinate
systems drawn in a matrix layout.
• Smith plots are specialized graphs used in microwave
engineering that have a single two-dimensional coordinate
system.

244
Graphics Objects

• A Polar graph has a single polar coordinate system.


• Text graphs display pie charts and have no coordinate system
other than the measurements of the graph sheet.
Graph properties determine the size and shape of both the graph area
and the plot area. You can fill both areas with colors and include
borders around them. All graphs support the Trellis paradigm of
displaying multiple panels; the graph properties determine the
conditioning data and the layout of the panels. The 3D graphs also
have properties that determine the shape and view angle of the 3D
workbox.

Axes The characteristics of the coordinate systems within graphs are set by
the properties of axes objects. Typically, axes properties contain
information about the range, tick positions, and display characteristics
of an axis, such as line color and line weight. Axes for 2D graphs also
have properties that determine scaling and axis breaks. All axes other
than those for 2D graphs contain information about tick labels and
axis titles; 2D axes contain separate objects for tick labels and axis
titles, both of which have their own properties.

Plots A plot contains data specifications and options relating to how the
data are displayed. In many cases, a plot determines the type of
calculations that S-PLUS performs on the data before drawing the plot.
A plot is always contained within a graph and is associated with a
particular type of coordinate system. For example, a 2D graph can
contain any of the following plot types, among others: bar charts, box
plots, contour plots, histograms, density plots, dot charts, line plots,
and scatter plots. Plot properties are components that describe aspects
of the plot such as the line style and color.

Annotations Annotation objects can be placed directly on a graph sheet or included


within a graph. If annotations are contained in a graph, S-PLUS
repositions them as the graph is repositioned on the page. Annotation
properties control display information such as line color and line style.
They also control an annotation’s position on the graph or graph
sheet; the units that define the position can be either document units
as determined by the graph sheet, or axes units as determined by the

245
Chapter 7 Editable Graphics Commands

graph’s coordinate system. Examples of annotations include titles and


legends, which we discuss more in the section Titles and Annotations
on page 281.

Object Path Every graph object in S-PLUS has a unique path name that identifies
Names it. A valid path name has the following components:
• The first component is the name of the graph sheet preceded
by $$.
• The name of the graph sheet is followed by the graph number
or annotation number.
• The name of the graph is followed by the plot number, axis
number, or annotation number.
• The name of an annotation can be followed by numbers that
correspond to specific components. For example, legends are
annotations that can contain legend items, which control the
display of individual entries in a legend.
• In 2D graphics, the name of an axis can be followed by
numbers that correspond to tick labels or axis titles.
• The name of some plots can be followed by numbers that
correspond to particular plot components. For example,
confidence intervals are components that are associated with
specific curve fit plots.
The components in the path name for a graph object are separated by
dollar signs. You can think of the individual components as
containers. For example, plots are contained within graphs, and
graphs are contained within graph sheets; therefore, the path name
$$GS1$1$1 refers to the first plot in the first graph of the graph sheet
named GS1. Likewise, annotations can be contained within graphs, so
the path name $$GS1$1$1 can also refer to the first annotation in the
first graph of GS1. Figure 7.1 visually displays this hierarchy of object
path names.
If a path name does not include the name of a graph sheet, S-PLUS
assumes it refers to the current graph sheet instead. The current graph
sheet is the one that was most recently created, modified, or viewed.

246
Graphics Objects

Graph Sheet

Annotation Graph

Annotation Components Annotation Axis Plot

Annotation Components Tick Label Axis Title Plot Components

Figure 7.1: Hierarchy of graph objects in path names. Each node in the tree can be a
component of a path name.To construct a full path name for a particular type of
graph object, follow a branch in the tree and place dollar signs between the names in
the branch.

You can use the following functions to obtain path names for specific
types of graph objects. Most of the functions accept a value for the
GraphSheet argument, which is a character vector giving the name of
the graph sheet. By default, GraphSheet="" and the current graph
sheet is used.
• guiGetAxisLabelsName: Returns the path name of the tick
labels for a specified axis. By default, S-PLUS returns the path
name of the labels for axis 1, which is the first x axis in the first
plot on the graph sheet.
• guiGetAxisName: Returns the path name of a specified axis.
By default, the path name for axis 1 is returned.
• guiGetAxisTitleName: Returns the path name of the title for a
specified axis. By default, the path name of the title for axis 1
is returned.

247
Chapter 7 Editable Graphics Commands

• guiGetGSName: Returns the path name of the current graph


sheet.
• guiGetGraphName: Returns the path name of the graph with a
specified graph number.
• guiGetPlotName: Returns the path name of the plot with the
specified graph and plot numbers.

248
Graphics Commands

GRAPHICS COMMANDS
This section describes the programming interface to the editable
graphics system. The three main functions we discuss are guiPlot,
guiCreate, and guiModify. You can use guiPlot and guiCreate to
draw graphics and guiModify to change particular properties about
your plots. For detailed descriptions of the plot types and their GUI
options, see the User’s Guide.
Throughout this chapter, we emphasize using guiPlot over
guiCreate to generate editable graphics. This is primarily because
guiPlot is easier to learn for basic plotting purposes. In this section,
however, we provide examples using both guiPlot and guiCreate.
The main differences between the two functions are:
• The guiPlot function is used exclusively for editable
graphics, while guiCreate can be used to create other GUI
elements such as new Data windows and Object Explorer
pages.
• The guiPlot function accepts a plot type as an argument while
guiCreate accepts a plot class. We discuss this distinction more
in the subsection below.
• Calls to guiPlot are recorded in the condensed History Log
while calls to guiCreate are recorded in the full History Log.
If you are interested solely in the editable graphics system, we
recommend using guiPlot to create most of your plots. If you are
interested in programmatically customizing the S-PLUS graphical user
interface, using guiCreate to generate graphics may help you become
familiar with the syntax of the required function calls.

Plot Types and S-PLUS includes a large number of editable plot types, as evidenced
Plot Classes by the collective size of the three plot palettes. Plot types are
organized into various plot classes, so that the plots in a particular class
share a set of common properties. To see a list of all classes for the
S-PLUS graphical user interface (of which the plot classes are a subset),
use the guiGetClassNames function.

249
Chapter 7 Editable Graphics Commands

> guiGetClassNames()

[1] "ActiveDocument" "Application" "Arc"


[4] "Area" "AreaPanel" "AreaPlot"
[7] "Arrow" "attribute" "Axes3D"
[10] "AxesMatrix" "AxesPolar" "Axis2DLabelX"
[13] "Axis2DLabelY" "Axis2dX" "Axis2dY"
[16] "AxisPanel" "AxisPanelLabel" "AxumBoxPlot"
[19] "Bar" "BarPanel" "BarPlot"
[22] "Box" "BoxPanel" "BoxPlot"
[25] . . .

See the section Plot Types on page 258 for comprehensive lists of
plots and their corresponding plot classes. Table 7.1 lists the most
common classes for the remaining by graph objects (graph sheets,
graphs, axes, and annotations).
Table 7.1: Common classes for graph objects. This table does not include plot classes.

Graph Object GUI Classes

Graph Sheets GraphSheet, GraphSheetPage, GraphSheetPageItem.

Graphs Graph2D, Graph3D, GraphMatrix, GraphPolar,


GraphSmith, TextGraph.

Axes Axes3D, AxesMatrix, AxesPolar, Axis3DLabelX,


Axis2DLabelY, Axis2dX, Axis2dY.

Titles MainTitle, Subtitle, XAxisTitle, YAxisTitle.

Legends Legend, LegendItem, ScaleLegend.

Other Arc, Arrow, CommentDate, ConfidenceBound, DateStamp,


Annotations Ellipse, Radius, ReferenceLine, Slice, Symbol.

You can create variations of basic plot types by modifying the


appropriate properties. When creating or modifying a plot, you
specify properties by name as arguments to the guiCreate and
guiModify functions. Thus, both guiCreate and guiModify accept
plot classes for their first arguments while guiPlot accepts plot types.

250
Graphics Commands

For example, Line, Scatter, and Line Scatter plots are all members
of the plot class LinePlot. You can create a scatter plot easily with
either guiPlot or guiCreate as follows:

guiPlot("Scatter", DataSet = "fuel.frame",


Columns = "Weight, Mileage")

guiCreate("LinePlot", DataSet = "fuel.frame",


xColumn = "Weight", yColumn = "Mileage")

Note that guiPlot accepts the plot type Line Scatter as its first
argument while guiCreate accepts the plot class LinePlot. The
guiCreate arguments DataSet, xColumn, and yColumn all define
properties of a LinePlot graphic; they correspond the first three
entries on the Data to Plot page of the Line/Scatter Plot dialog.
To create a line plot with symbols using all of the default values, type:

guiPlot("Line Scatter", DataSet = "fuel.frame",


Columns = "Weight, Mileage")

You can generate the same plot with guiCreate as follows:

guiCreate("LinePlot", DataSet = "fuel.frame",


xColumn = "Weight", yColumn = "Mileage",
LineStyle = "Solid")

Similarly, you can create a line plot without symbols using either of
the following commands:

guiPlot("Line", DataSet = "fuel.frame",


Columns = "Weight, Mileage")

guiCreate("LinePlot", DataSet = "fuel.frame",


xColumn = "Weight", yColumn = "Mileage",
LineStyle = "Solid", SymbolStyle = "None")

In each of the above examples, S-PLUS opens a new graph sheet


containing a 2D graph with a set of x and y axes, and then draws the
plot within the graph.

251
Chapter 7 Editable Graphics Commands

Viewing You can obtain on-line help for guiPlot using the help function just
Argument Lists as you would for any other built-in command. The help files for
guiCreate and guiModify are structured by class name, however.
and Online Typing help("guiCreate") displays a short, general help file; to see a
Help detailed help page, you must also include the class name. For
example, to see help on the LinePlot class, type:

> help("guiCreate(\"LinePlot\")"

The backslash characters are necessary so that S-PLUS recognizes the


nested quotation marks.
Similarly, you can use the guiPrintClass function to see a summary
of the information contained in the on-line help files. The output from
guiPrintClass includes the following:

• A list of arguments for the plot class.


• The dialog prompt that corresponds to each argument.
• The default value.
• Any options that are available.
For example, to see a summary of the LinePlot class, type:

> guiPrintClass("LinePlot")

CLASS: LinePlot
ARGUMENTS:
Name
Prompt: Name
Default: ""
DataSet
Prompt: Data Set
Default: "fuel.frame"
xColumn
Prompt: x Columns
Default: ""
yColumn
Prompt: y Columns
Default: ""
zColumn
Prompt: z Columns
Default: ""
. . .

252
Graphics Commands

The Prompt value gives the name of the field in the Line/Scatter
Plot dialog that corresponds to each argument. The Default entry
gives the default value for the argument, and Option List shows the
possible values the argument can assume.
The argument lists for guiCreate and guiModify are also organized
by class name. Instead of using the args function to see a list of
arguments, use the guiGetArgumentNames function. For example, the
following command lists the arguments and properties that you can
specify for the LinePlot class:

# The args function does not return a detailed list of


# arguments.
> args(guiCreate)
function(classname, ...)

> args(guiModify)
function(classname, GUI.object, ...)

# The guiGetArgumentNames function returns the arguments


# for a particular plot class.
> guiGetArgumentNames("LinePlot")

[1] "Name" "DataSet"


[3] "xColumn" "yColumn"
[5] "zColumn" "wColumn"
[7] "PlotConditionType" "ConditionDataSet"
[9] "ConditionColumns" "PanelToDraw"
[11] "PointLabelsColumn" "RelativeAxisX"
[13] "RelativeAxisY" "RelativePlane"
[15] "UseForAspectRatio" "Hide"
[17] "Crop" "LineStyle"
[19] "LineColor" "LineWeight"
[21] . . .

You can pass the properties returned by guiGetArgumentNames to


either guiCreate or guiModify. Each property corresponds to a field
in the dialog for the plot class. The properties returned by the above
command all have fields in the Line/Scatter Plot dialog.

Specifying You can specify data for plots either by name or by value. The examples
Data so far in this section illustrate the syntax for specifying data by name.
The commands in the examples all refer to data sets and their

253
Chapter 7 Editable Graphics Commands

columns by the associated names, such as "fuel.frame", "Mileage",


and "Weight". In this case, the plot is live; it automatically updates
when you open it or bring it into focus after the values in the data set
have changed. With guiPlot, you specify data by name using the
DataSet and Columns arguments, which must be character vectors.
With guiCreate and guiModify, you specify data by name using the
arguments DataSet, xColumn, yColumn, zColumn and wColumn, all of
which accept character vectors.
Alternatively, a plot can store the data internally by value. The
expression used to specify the data is evaluated when the plot is
created and is not updated thereafter. This is similar to selecting
Graph  Embed Data when you wish to embed data in a particular
GUI plot. To specify the data values that are used permanently within
a plot, use the argument DataSetValues in guiPlot. With guiCreate
and guiModify, use the arguments DataSetValues, xValues, yValues,
zValues, and wValues. All of these arguments accept S-PLUS
expressions such as subscripting statements and data frame names.
For example, to create a scatter plot of Mileage versus Weight that
stores a copy of the data internally in the graph sheet, use one of the
following commands:

guiPlot("Scatter", DataSetValues =
fuel.frame[, c("Mileage","Weight")])

guiCreate("LinePlot",
xValues = fuel.frame$Mileage,
yValues = fuel.frame$Weight)

If you generate plots from within a function, you may want to pass the
data by value if you construct the data set in the function as well.
S-PLUS erases the data upon termination of the function. Therefore,
any graphs the function generates by passing the data by name will be
empty.

Display There are a number of display properties commonly used in plots


Properties and annotation objects. Table 7.2 lists the properties that determine
the appearance of lines and symbols. They correspond to fields in the
Line and Symbol pages of many plot dialogs.

254
Graphics Commands

Table 7.2: Common display properties of plots and annotation objects.

Property Description Settings

LineColor Color of the lines drawn "Transparent", "Black", "Blue", "Green", "Cyan",
between data points in the "Red", "Magenta", "Brown", "Lt Gray",
plot. Accepts a character "Dark Gray", "Lt Blue", "Lt Green", "Lt Cyan",
vector naming the color. "Lt Red", "Lt Magenta", "Yellow",
"Bright White", "User1", "User2", ..., "User16".

LineStyle Style of the lines drawn "None", "Solid", "Dots", "Dot Dash",
between data points in the "Short Dash", "Long Dash", "Dot Dot Dash",
plot. Accepts a character "Alt Dash", "Med Dash", "Tiny Dash".
vector naming the style.

LineWeight Thickness of the lines


drawn between data points
in the plot. Accepts a
numeric value measured in
point size.

SymbolColor Color of the symbols used Identical to the settings for LineColor.
to plot the data points.
Accepts a character vector
naming the color.

SymbolStyle Style of the symbols used to Integer values: 0,1, 2, ..., 27.
plot the data points.
Corresponding character values:
Accepts either an integer
"None"; "Circle, Solid"; "Circle, Empty";
value representing the style
"Box, Solid"; "Box, Empty";
or a character vector
"Triangle, Up, Solid"; "Triangle, Dn, Solid";
naming it.
"Triangle, Up, Empty"; "Triangle, Dn, Empty";
"Diamond, Solid"; "Diamond, Empty"; "Plus";
"Cross"; "Ant"; "X"; "-"; "|"; "Box X"; "Plus X";
"Diamond X"; "Circle X"; "Box +"; "Diamond +";
"Circle +"; "Tri. Up Down"; "Tri. Up Box";
"Tri. Dn Box"; "Female"; "Male".

255
Chapter 7 Editable Graphics Commands

Table 7.2: Common display properties of plots and annotation objects.

Property Description Settings

SymbolHeight Size of the symbols used to


plot the data points.
Accepts a numeric value
measured in point size.

To use the properties listed in the table to change the appearance of


your plot, pass them as arguments to either guiCreate or guiModify.
For example, the following commands create a plot of Mileage versus
Weight where the points are light red, filled circles and the lines that
connect the points are light blue dashes.

# Create a basic plot with guiPlot and modify its


# properties with guiModify.
guiPlot("Scatter",
DataSetValues = fuel.frame[, c("Mileage", "Weight")])
guiModify("LinePlot", Name = guiGetPlotName(),
LineStyle = "Short Dash",
LineColor = "Lt Blue",
LineWeight = "1/2",
SymbolStyle = "Circle, Solid", SymbolColor = "Lt Red")

You can accomplish the same thing using guiCreate as follows:

# Create a basic line plot with guiCreate and modify its


# properties with guiModify.
guiCreate("LinePlot", xValues = fuel.frame$Mileage,
yValues = fuel.frame$Weight)
guiModify("LinePlot", Name = guiGetPlotName(),
LineStyle = "Short Dash",
LineColor = "Lt Blue",
LineWeight = "1/2",
SymbolStyle = "Circle, Solid", SymbolColor = "Lt Red")

In both of the above calls to guiModify, the guiGetPlotName function


returns the path name of the active plot. We discuss path names of
GUI objects in the section Object Path Names on page 246.

256
Graphics Commands

Because you can pass each of the properties in Table 7.2 to guiCreate
as well as to guiModify, you can also draw the plot using a single call
to guiCreate:

guiCreate("LinePlot", xValues = fuel.frame$Mileage,


yValues = fuel.frame$Weight,
LineStyle = "Short Dash",
LineColor = "Lt Blue",
LineWeight = "1/2",
SymbolStyle = "Circle, Solid", SymbolColor = "Lt Red")

Displaying You can use the guiDisplayDialog function to open the property
Dialogs dialog for a particular graph object. For example, the following
command displays the dialog for the current plot of class LinePlot:

guiDisplayDialog("LinePlot", Name = guiGetPlotName())

The properties for the plot may be modified using the dialog that
appears.

257
Chapter 7 Editable Graphics Commands

PLOT TYPES
The S-PLUS editable graphics system has a wide variety of available
plot types. In this section, we present guiPlot commands you can use
to generate each type of plot. The plots are organized first by palette
(Plots2D, ExtraPlots, and Plots3D) and then by plot class. We
discuss commands for customizing axes and layout operations in a
later section. For additional details on any of the plot types, see the
User’s Guide.
As we mention in the section Getting Started on page 241, you can
use the guiGetPlotClass function to see a list of all plot types that
guiPlot accepts. Once you know the name of a particular plot type,
you can also use guiGetPlotClass to return its class. For example, the
Bubble plot type belongs to the LinePlot class:

> guiGetPlotClass("Bubble")
[1] "LinePlot"

Knowing both the type and class for a particular plot allows you to
use guiPlot, guiCreate, and guiModify interchangeably.

The Plots2D The Plots2D and ExtraPlots palettes contain a collection of two-
and ExtraPlots dimensional plots. Table 7.3 shows a quick description of the plot
classes and the plots that belong to each of them.
Palettes
Table 7.3: The plot types available in the Plots2D and ExtraPlots palettes. The left column of the table gives
the class that each plot type belongs to.

Plot Class Description Available Plot Types

LinePlot Line and scatter plots. Scatter, Line, Line Scatter, Isolated Points,
Text as Symbols, Bubble, Color, Bubble Color,
Vert Step, Horiz Step, XY Pair Scatters, XY
Pair Lines, High Density, Horiz Density, Y
Zero Density, Robust LTS, Robust MM, Loess,
Spline, Supersmooth, Kernel, Y Series Lines,
Dot.

LinearCFPlot Linear curve fit plots. Linear Fit, Poly Fit, Exp Fit, Power Fit, Ln
Fit, Log10 Fit.

258
Plot Types

Table 7.3: The plot types available in the Plots2D and ExtraPlots palettes. The left column of the table gives
the class that each plot type belongs to.

Plot Class Description Available Plot Types

NonlinearCFPlot Nonlinear curve fit plots. NLS Fit.

MatrixPlot Scatterplot matrices. Scatter Matrix.

BarPlot Bar plots. Bar Zero Base, Bar Y Min Base, Grouped Bar,
Stacked Bar, Horiz Bar, Grouped Horiz Bar,
Stacked Horiz Bar, Bar with Error, Grouped
Bar with Error.

HiLowPlot High-low plots for time High Low, Candlestick.


series data.

BoxPlot Box plots for a single or Box, Horizontal Box.


grouped variable.

AreaPlot Area charts. Area.

QQPlot One- and two-sample QQ Normal, QQ.


quantile-quantile plots.

PPPlot One- and two-sample PP Normal, PP.


probability plots.

ParetoPlot Pareto plots. Pareto, Horizontal Pareto Plot.

Histogram Histograms and density Histogram, Density, Histogram Density.


curves.

PiePlot Pie charts. Pie.

ErrorBarPlot Error bar plots. Error Bar, Horiz Error Bar, Error Bar - Both.

ContourPlot Contour and level plots. Contour, Filled Contour, Levels.

VectorPlot Vector plots. Vector.

259
Chapter 7 Editable Graphics Commands

Table 7.3: The plot types available in the Plots2D and ExtraPlots palettes. The left column of the table gives
the class that each plot type belongs to.

Plot Class Description Available Plot Types

CommentPlot Plots in which a third Comment.


variable can be used to
write comments on the
graph.

SmithPlot Smith plots. Smith.

PolarPlot Polar line and scatter Polar Line, Polar Scatter.


plots.

The LinePlot The LinePlot class includes various kinds of line and scatter plots.
Class The scatter plot is the fundamental visual technique for viewing and
exploring relationships in two-dimensional data. Its extensions
include line plots, text plots, bubble plots, step plots, robust linear fits,
smooths, and dot plots. The line and scatter plots we illustrate here
are the most basic types of plots for displaying data. You can use
many of them to plot a single column of data as well as one data
column against another.

Scatter plot

guiPlot("Scatter", DataSetValues =
data.frame(util.mktbook, util.earn))

Line plot

guiPlot("Line", DataSetValues =
data.frame(util.mktbook, util.earn))

Line with scatter plot

guiPlot("Line & Scatter", DataSetValues =


data.frame(util.mktbook, util.earn))

Isolated points plot

guiPlot("Isolated Points", DataSetValues =


data.frame(util.mktbook, util.earn))

260
Plot Types

Text as symbols plot

guiPlot("Text as Symbols", DataSetValues =


data.frame(util.mktbook, util.earn, 1:45))
guiModify("LinePlot", Name = guiGetPlotName(),
SymbolHeight = "0.2")

Bubble plot

guiPlot("Bubble", DataSetValues =
data.frame(util.mktbook, util.earn, 1:45))

Color plot

guiPlot("Color", DataSetValues =
data.frame(util.mktbook, util.earn, 1:45))

Bubble color plot

guiPlot("BubbleColor", DataSetValues =
data.frame(util.mktbook, util.earn, 45:1, 1:45))

Vertical step plot

guiPlot("Vert Step", DataSetValues =


data.frame(x = 1:10, y = seq(from=2, to=20, by=2)))

Horizontal step plot

guiPlot("Horiz Step", DataSetValues =


data.frame(x = 1:10, y = seq(from=2, to=20, by=2)))

XY pairs scatter plot

guiPlot("XY Pair Scatters", DataSetValues =


data.frame(x1 = 1:10, y1 = rnorm(10, mean=1, sd=0.5),
x2 = 6:15, y2 = rnorm(10, mean=5, sd=0.5)))

XY pairs line plot

guiPlot("XY Pair Lines", DataSetValues =


data.frame(x1 = 1:10, y1 = rnorm(10, mean=1, sd=0.5),
x2 = 6:15, y2 = rnorm(10, mean=5, sd=0.5)))

261
Chapter 7 Editable Graphics Commands

Vertical high density plot

guiPlot("High Density", DataSetValues =


data.frame(util.mktbook, util.earn))

Horizontal high density plot

guiPlot("Horiz Density", DataSetValues =


data.frame(util.mktbook, util.earn))

Y zero high density plot

guiPlot("Y Zero Density", DataSetValues =


data.frame(x = 1:20, y = runif(20, min=-10, max=10)))

Robust least trimmed squares linear fit

guiPlot("Robust LTS", DataSetValues =


data.frame(util.mktbook, util.earn))

Robust MM linear fit

guiPlot("Robust MM", DataSetValues =


data.frame(util.mktbook, util.earn))

Loess smooth

guiPlot("Loess", DataSetValues =
data.frame(util.mktbook, util.earn))

Smoothing spline

guiPlot("Spline", DataSetValues =
data.frame(util.mktbook, util.earn))

Friedman’s supersmoother

guiPlot("Supersmooth", DataSetValues =
data.frame(util.mktbook, util.earn))

Kernel smooth

guiPlot("Kernel", DataSetValues =
data.frame(util.mktbook, util.earn))

262
Plot Types

Y series lines

guiPlot("Y Series Lines", DataSet = "hstart",


Columns = "hstart")

Dot plot

guiPlot("Dot", DataSetValues =
data.frame(NumCars = table(fuel.frame$Type),
CarType = levels(fuel.frame$Type)))

The LinearCFPlot The linear, polynomial, exponential, power, and logarithmic curve
Class fits all have class LinearCFPlot. Curve-fitting plots in this class display
a regression line with a scatter plot of the associated data points. The
curves are computed with an ordinary least-squares algorithm.

Linear fit

guiPlot("Linear Fit", DataSetValues =


data.frame(util.mktbook, util.earn))

Polynomial fit

guiPlot("Poly Fit", DataSetValues =


data.frame(util.mktbook, util.earn))

Exponential fit

guiPlot("Exp Fit", DataSetValues =


data.frame(util.mktbook, util.earn))

Power fit

guiPlot("Power Fit", DataSetValues =


data.frame(util.mktbook, util.earn))

Natural logarithmic fit

guiPlot("Ln Fit", DataSetValues =


data.frame(util.mktbook, util.earn))

Common logarithmic fit

guiPlot("Log10 Fit", DataSetValues =


data.frame(util.mktbook, util.earn))

263
Chapter 7 Editable Graphics Commands

The The NonlinearCFPlot class includes a single plot type for fitting
NonlinearCFPlot nonlinear curves. In addition to the data, this type of plot needs a
Class formula and a vector of initial values for any specified parameters.
For this reason, it is usually easier to create the plot with a single call
to guiCreate, rather than sequential calls to guiPlot and guiModify.

Nonlinear fit

guiCreate("NonlinearCFPlot", DataSet = "Orange",


xColumn = "age", yColumn = "circumference",
Model = "circumference ~ A/(1 + exp(-(age-B)/C))",
Parameters = "A=150, B=600, C=400")

The MatrixPlot The MatrixPlot class includes a single plot type for displaying
Class scatterplot matrices. This type of plot displays an array of pairwise
scatter plots illustrating the relationship between any pair of variables
in a data set.

Scatterplot matrix

guiPlot("Scatter Matrix", DataSet = "fuel.frame",


Columns = "Mileage, Weight, Type")

The BarPlot Class A wide variety of bar plots are available in the editable graphics
system via the BarPlot class. A bar plot displays a bar for each point in
a set of observations, where the height of a bar is determined by the
value of the data point. For most ordinary comparisons, we
recommend the simplest bar plot with the zero base. For more
complicated analysis, you may wish to display grouped bar plots,
stacked bar plots, or plots with error bars.

Vertical bar plot with zero base

guiPlot("Bar Zero Base", DataSetValues =


data.frame(as.factor(c("A","B")), c(-20,70)))

Vertical bar plot with Y minimum base

guiPlot("Bar Y Min Base", DataSetValues =


data.frame(as.factor(c("A","B")), c(-20,70)))

264
Plot Types

Vertical grouped bar plot

guiPlot("Grouped Bar", DataSetValues =


data.frame(as.factor(c("A","B")), c(20,70), c(30,80)))
guiModify("BarPlot", Name = guiGetPlotName(),
BarBase = "Zero")

Vertical stacked bar plot

guiPlot("Stacked Bar", DataSetValues =


data.frame(as.factor(c("A","B")), c(20,70), c(30,80)))

Horizontal bar plot

guiPlot("Horiz Bar", DataSetValues =


data.frame(c(20,70), as.factor(c("A","B"))))

Horizontal grouped bar plot

guiPlot("Grouped Horiz Bar", DataSetValues =


data.frame(c(30,80), c(20,70), as.factor(c("A","B"))))
guiModify("BarPlot", Name = guiGetPlotName(),
BarBase = "Zero")

Horizontal stacked bar plot

guiPlot("Stacked Horiz Bar", DataSetValues =


data.frame(c(30,80), c(20,70), as.factor(c("A","B"))))

Vertical bar plot with error

guiPlot("Bar with Error", DataSetValues =


data.frame(as.factor(c("A","B")), c(20,70), c(3,6)))

Vertical grouped bar plot with error

guiPlot("Grouped Bar with Error")


guiModify("BarPlot", Name = guiGetPlotName(),
xValues = as.factor(c("A","B")),
yValues = data.frame(c(20,70), c(30,80)),
zValues = data.frame(c(3,3), c(10,10)))

The HiLowPlot The HiLowPlot class contains two types of plots: the high-low plot and
Class the candlestick plot. A high-low plot typically displays lines indicating
the daily, monthly, or yearly extreme values in a time series. These

265
Chapter 7 Editable Graphics Commands

kinds of plots can also include average, opening, and closing values,
and are referred to as high-low-open-close plots in these cases.
Meaningful high-low plots can thus display from three to five
columns of data, and illustrate simultaneously a number of important
characteristics about time series data. Because of this, they are most
often used to display financial data.
One variation on the high-low plot is the candlestick plot. Where
typical high-low plots display the opening and closing values of a
financial series with lines, candlestick plots use filled rectangles. The
color of the rectangle indicates whether the difference is positive or
negative. In S-PLUS, cyan rectangles represent positive differences,
when closing values are larger than opening values. Dark blue
rectangles indicate negative differences, when opening values are
larger than closing values.

High-low-open-close plot

dow <- djia[positions(djia) >= timeDate("09/01/87") &


positions(djia) <= timeDate("11/01/87"), ]
guiPlot("High Low", DataSet = "dow",
Columns = "Positions, open, close, high, low")

Candlestick plot

guiPlot("Candlestick", DataSet = "dow",


Columns = "Positions, open, close, high, low")

The BoxPlot Class The BoxPlot class contains box plots that show the center and spread
of a data set as well as any outlying data points. In the editable
graphics system, box plots can be created for a single variable or a
grouped variable.

Vertical box plot

guiPlot("Box", DataSetValues = data.frame(util.earn))

Horizontal box plot

guiPlot("Horizontal Box", DataSetValues =


data.frame(util.earn))

266
Plot Types

Vertical grouped box plot

guiPlot("Box", DataSet = "fuel.frame",


Columns = "Type, Mileage")

Horizontal grouped box plot

guiPlot("Horizontal Box", DataSet = "fuel.frame",


Columns = "Type, Mileage")

The AreaPlot The AreaPlot class contains a single plot type that displays area plots.
Class An area chart fills the space between adjacent series with color. It is
most useful for showing how each series in a data set affects the whole
over time.

Area plot

guiPlot("Area", DataSetValues =
data.frame(car.time, car.gals))

The QQPlot Class The QQPlot class produces quantile-quantile plots, or qqplots, which
are extremely powerful tools for determining good approximations to
the distributions of data sets. In a one-dimensional qqplot, the
ordered data are graphed against quantiles of a known theoretical
distribution. If the data points are drawn from the theoretical
distribution, the resulting plot is close to the line y = x in shape. The
normal distribution is often the distribution used in this type of plot,
giving rise to the plot type "QQ Normal". In a two-dimensional qqplot,
the ordered values of the variables are plotted against each other. If
the variables have the same distribution shape, the points in the
qqplot cluster along a straight line.

QQ normal plot

# Two data sets compared with the normal distribution.


guiPlot("QQ Normal", DataSetValues =
data.frame(rnorm(25), runif(25)))

QQ plot

# Two data sets plotted against each other.


guiPlot("QQ", DataSetValues =
data.frame(rnorm(25), runif(25)))

267
Chapter 7 Editable Graphics Commands

# One data set compared with the Chi-square distribution.


guiPlot("QQ", DataSetValues = data.frame(rchisq(20,5)))
guiModify("QQPlot", Name = guiGetPlotName(),
Function = "Chi-Squared", df1 = "5")

The PPPlot Class The PPPlot class produces probability plots. A one-dimensional
probability plot is similar to a qqplot except that the ordered data
values are plotted against the quantiles of a cumulative probability
distribution function. If the hypothesized distribution adequately
describes the data, the plotted points fall approximately along a
straight line. In a two-dimensional probability plot, the observed
cumulative frequencies of both sets of data values are plotted against
each other; if the data sets have the same distribution shape, the
points in the plot cluster along the line y = x .

PP normal plot

guiPlot("PP Normal", DataSetValues = data.frame(rnorm(25)))

PP plot

# Two data sets plotted against each other.


guiPlot("PP", DataSetValues =
data.frame(rnorm(25), runif(25)))

# One data set compared with the Chi-square distribution.


guiPlot("PP", DataSetValues = data.frame(rchisq(20,5)))
guiModify("PPPlot", Name = guiGetPlotName(),
Function = "Chi-Squared", df1 = "5")

The ParetoPlot The ParetoPlot class displays Pareto charts, which are essentially
Class specialized histograms. A Pareto chart orders the bars in a histogram
from the most frequent to the least frequent, and then overlays a line
plot to display the cumulative percentages of the categories. This type
of plot is most useful in quality control analysis, where it is generally
helpful to focus resources on the problems that occur most frequently.
In the examples below, we use the data set exqcc2 that is located in
the samples\Documents\exqcc2.sdd file under your S-PLUS home
directory.

268
Plot Types

Vertical Pareto plot

data.restore(paste(getenv("SHOME"),
"samples/Documents/exqcc2.sdd", sep = "/"))
guiPlot("Pareto", DataSet = "exqcc2",
Columns = "NumSample, NumBad")

Horizontal Pareto plot

guiPlot("Horizontal Pareto Plot", DataSet = "exqcc2",


Columns = "NumBad, NumSample")

The Histogram The Histogram class creates histograms and density plots for one-
Class dimensional data. Histograms display the number of data points that
fall in each of a specified number of intervals. A density plot displays
an estimate of the underlying probability density function for a data
set and allows you to approximate the probability that your data fall
in any interval. A histogram gives an indication of the relative density
of the data points along the horizontal axis. For this reason, density
plots are often superposed with (scaled) histograms.

Histogram

guiPlot("Histogram", DataSetValues = data.frame(util.earn))

Density plot

guiPlot("Density", DataSetValues = data.frame(util.earn))

Histogram with density plot

guiPlot("Histogram Density", DataSetValues =


data.frame(util.earn))

The PiePlot Class The PiePlot class displays pie charts, which show the share of
individual values in a variable relative to the sum total of all the
values. The size of a pie wedge is relative to a sum, and does not
directly reflect the magnitude of the data value. Because of this, pie
charts are most useful when the emphasis is on an individual item’s
relation to the whole; in these cases, the sizes of the pie wedges are
naturally interpreted as percentages.

269
Chapter 7 Editable Graphics Commands

Pie chart

guiPlot("Pie", DataSetValues =
data.frame(table(fuel.frame$Type)))

The ErrorBarPlot The ErrorBarPlot class includes error bar plots, which display a range
Class of error around each plotted data point.

Vertical error bars

guiPlot("Error Bar", DataSetValues =


data.frame(as.factor(c("A","B")), c(20,70), c(3,6)))

Horizontal error bars

guiPlot("Horiz Error Bar", DataSetValues =


data.frame(c(20,70), as.factor(c("A","B")), c(3,6)))

Vertical and horizontal error bars

guiPlot("Error Bar - Both", DataSetValues =


data.frame(c(20,43), c(20,70), c(3,6), c(5,8)))

The ContourPlot The ContourPlot class displays contour plots and level plots. A
Class contour plot is a representation of three-dimensional data in a flat, two-
dimensional plane. Each contour line represents a height in the z
direction from the corresponding three-dimensional surface. A level
plot is essentially identical to a contour plot, but it has default options
that allow you to view a particular surface differently.

Contour plot

guiPlot("Contour", DataSet = "exsurf",


Columns = "V1, V2, V3")

Filled contour plot

guiPlot("Filled Contour", DataSet = "exsurf",


Columns = "V1, V2, V3")

Level plot

guiPlot("Levels", DataSet = "exsurf",


Columns = "V1, V2, V3")

270
Plot Types

The VectorPlot The VectorPlot class contains the vector plot type, which uses arrows
Class to display the direction and velocity of flow at particular positions in a
two-dimensional plane. To create a vector plot, specify two columns
of data for the positions of the arrows, a third column of data for the
angle values (direction), and a fourth column of data for the
magnitude (length). In the example below, we use the data set
exvector that is located in the samples\Documents\exvector.sdd
file under your S-PLUS home directory.

Vector plot

data.restore(paste(getenv("SHOME"),
"samples/Documents/exvector.sdd", sep = "/"))
guiPlot("Vector", DataSet = "exvector",
Columns = "x, y, angle, mag")

The The CommentPlot class contains the comment plot type, which displays
CommentPlot character labels on a two-dimensional graph. You can use comment
Class plots to display character data, plot combinations of characters as
symbols, produce labeled scatter plots, and create tables. To create a
comment plot, specify two columns of data for the position of each
comment and a third column for the text.

Comment plot

guiPlot("Comment", DataSetValues =
data.frame(x = 1:26, y = rnorm(26), z = LETTERS))

The SmithPlot The SmithPlot class contains Smith plots, which are drawn in polar
Class coordinates. This type of plot is often used in microwave engineering
to show impedance characteristics. There are three types of Smith
plots: reflection, impedance, and circle. In a reflection plot, the x
values are magnitudes in the range [0,1] and the y values are angles
in degrees that are measured clockwise from the horizontal. In an
impedance plot, the x values are resistance data and the y values are
reactance data. In a circle plot, the x values are positive and specify
the distance from the center of the Smith plot to the center of the
circle you want to draw. The y values are angles that are measured
clockwise from the horizontal; the z values are radii and must also be
positive.

271
Chapter 7 Editable Graphics Commands

Smith plots

# Reflection plot.
guiPlot("Smith", DataSetValues =
data.frame(x = seq(from=0, to=1, by=0.1), y = 0:10))
guiModify("SmithPlot", Name = guiGetPlotName(),
AngleUnits = "Radians")

# Impedance plot.
guiPlot("Smith", DataSetValues =
data.frame(x = seq(from=0, to=1, by=0.1), y = 0:10))
guiModify("SmithPlot", Name = guiGetPlotName(),
DataType = "Impedance", AngleUnits = "Radians")

# Circle plot.
guiPlot("Smith", DataSetValues =
data.frame(x = seq(from=0, to=1, by=0.1), y = 0:10,
z = seq(from=0, to=1, by=0.1)))
guiModify("SmithPlot", Name = guiGetPlotName(),
DataType = "Circle", AngleUnits = "Radians")

The PolarPlot The PolarPlot class displays line and scatter plots in polar
Class coordinates. To create a polar plot, specify magnitudes for the x values
in your data and angles (in radians) for the y values.

Polar line plot

guiPlot("Polar Line", DataSetValues = data.frame(


x = seq(from=0.1, to=2, by=0.1),
y = seq(from=0.5, to=10, by=0.5)))

Polar scatter plot

guiPlot("Polar Scatter", DataSetValues = data.frame(


x = seq(from=0.1, to=2, by=0.1),
y = seq(from=0.5, to=10, by=0.5)))

The Plots3D The Plots3D palette contains a collection of three-dimensional plots.


Palette Table 7.4 shows a quick description of the plot classes and the plots
that belong to each of them.

272
Plot Types

The last nine plots in the Plots3D palette are composite plots that do
not have their own classes. Instead, they are tools that allow you to
view plots we’ve discussed already in new and different ways. The
tools fall into two broad categories: rotated plots and conditioned plots.
We discuss each of these categories below.
Table 7.4: The plot types available in the Plots3D palette. The left column of the table gives the class that
each plot type belongs to.

Plot class Description Available Plot Types

Line3DPlot Line, scatter, drop- 3D Scatter, 3D Line, 3D Line Scatter, Drop Line
line, and regression Scatter, 3D Regression, 3D Reg Scatter.
plots.

SurfacePlot Surface and bar plots. Coarse Surface, Data Grid Surface, Spline Surface,
Filled Coarse Surface, Filled Data Grid Surface,
Filled Spline Surface, 8 Color Surface, 16 Color
Surface, 32 Color Surface, 3D Bar.

ContourPlot Contour plots. This 3D Contour, 3D Filled Contour.


class contains both
2D and 3D contour
plots. See Table 7.3.

Grid3D Projection planes. This group of plots does not have formal plot types. The
plots are listed in the Plots3D palette with the following
names:
XY Plane Z Min, XZ Plane Y Min, YZ Plane X Min, XY
Plane Z Max, XZ Plane Y Max, YZ Plane X Max.

Rotated plots. This group of plots has neither a plot class nor a
corresponding formal plot type. The plots are listed in
the Plots3D palette with the following names:
2 Panel Rotation, 4 Panel Rotation, 6 Panel Rotation.

273
Chapter 7 Editable Graphics Commands

Table 7.4: The plot types available in the Plots3D palette. The left column of the table gives the class that
each plot type belongs to.

Plot class Description Available Plot Types

Conditioned plots. This group of plots has neither a plot class nor a
corresponding formal plot type. The plots are listed in
the Plots3D palette with the following names:
Condition on X, Condition on Y, Condition on Z, No
Conditioning, 4 Panel Conditioning, 6 Panel
Conditioning.

In the subsections below, we present examples for each of the plot


types listed in the table. The data set we use in the examples is created
as follows:

x <- ozone.xy$x
y <- ozone.xy$y
z <- ozone.median
ozone.df <- data.frame(x,y,z)

To familiarize yourself with this data set and the 3D plot types, first
create a mesh surface plot:

guiPlot("Data Grid Surface", DataSetValues = ozone.df)

Next, add the data points as a separate plot to the surface:

guiCreate("Line3DPlot", Name = "1$2",


xValues = x, yValues = y, zValues = z,
SymbolStyle = "Circle, Solid")

The Data Grid Surface is the first plot in the first graph of the graph
sheet. We give the plot of data points the name 1$2 to designate it as
the second plot in the first graph. For more details on naming
conventions for graph objects, see the section Object Path Names on
page 246.
You can use guiModify to rotate the axes:

guiModify("Graph3D", Name = guiGetGraphName(),


Rotate3Daxes = T)

274
Plot Types

Note that Rotate3Daxes is part of the properties for the graph type
Graph3D and not the plot type Line3DPlot; see the section Graphics
Objects on page 244 for details.
If you would like to see the surface again without the overlaid data
points, use the guiRemove function to remove the second plot:

guiRemove("Line3DPlot", Name = "1$2")

The Line3DPlot The Line3DPlot class contains scatter and line plots that display
Class multidimensional data in three-dimensional space. Typically, static
3D scatter and line plots are not effective because the depth cues of
single points are insufficient to give strong 3D effects. On some
occasions, however, they can be useful for discovering simple
relationships between three variables. To improve the depth cues in a
3D scatter plot, you can add drop lines to each of the points; this gives
rise to the plot type "Drop Line Scatter". The 3D Regression plot
draws a regression plane through the data points.

Scatter plot

guiPlot("3D Scatter", DataSetValues = ozone.df)

Line plot

guiPlot("3D Line", DataSetValues = ozone.df)

Line with scatter plot

guiPlot("3D Line Scatter", DataSetValues = ozone.df)

Drop line scatter plot

guiPlot("Drop Line Scatter", DataSetValues = ozone.df)

Regression plot

guiPlot("3D Regression", DataSetValues = ozone.df)

Regression with scatter plot

guiPlot("3D Reg Scatter", DataSetValues = ozone.df)

275
Chapter 7 Editable Graphics Commands

The SurfacePlot The SurfacePlot class includes different types of surface plots, which
Class are approximations to the shapes of three-dimensional data sets.
Spline surfaces are smoothed plots of gridded 3D data, and 3D bar
plots are gridded surfaces drawn with bars. For two variables, a 3D
bar plot produces a binomial histogram that shows the joint
distribution of the data. A color surface plot allows you to specify
color fills for the bands or grids in your surface plot.

Coarse surface

guiPlot("Coarse Surface", DataSetValues = ozone.df)

Data grid surface

guiPlot("Data Grid Surface", DataSetValues = ozone.df)

Spline surface

guiPlot("Spline Surface", DataSetValues = ozone.df)

Coarse filled surface

guiPlot("Filled Coarse Surface", DataSetValues = ozone.df)

Data grid filled surface

guiPlot("Filled Data Grid Surface",


DataSetValues = ozone.df)

Filled spline surface

guiPlot("Filled Spline Surface", DataSetValues = ozone.df)

Eight color draped surface

guiPlot("8 Color Surface", DataSetValues = ozone.df)

Sixteen color draped surface

guiPlot("16 Color Surface", DataSetValues = ozone.df)

276
Plot Types

Thirty-two color draped surface

guiPlot("32 Color Surface", DataSetValues = ozone.df)

Bar plot

guiPlot("3D Bar", DataSetValues = ozone.df)

The ContourPlot The 3D contour plots are identical to 2D contour plots, except that
Class the contour lines are drawn in three-dimensional space instead of on
a flat plane. For more details, see the section The ContourPlot Class
on page 270.

Contour plot

guiPlot("3D Contour", DataSetValues = ozone.df)

Filled contour plot

guiPlot("3D Filled Contour", DataSetValues = ozone.df)

The Grid3D Class The Grid3D class contains a set of two-dimensional planes you can use
either on their own or overlaid on other 3D plots. The class is
separated into six plots according to which axis a plane intersects and
where. For example, the plot created by the XY Plane Z Min button
in the Plots3D palette intersects the z axis at its minimum.
The plots in the Grid3D class do not have their own plot types.