S Programmation
S Programmation
Programmer’s Guide
July 2001
Insightful Corporation
Seattle, Washington
Proprietary Insightful Corporation owns both this software program and its
Notice documentation. Both the program and documentation are
copyrighted with all rights reserved by Insightful Corporation.
The correct bibliographical reference for this document is as follows:
S-PLUS 6 for Windows Programmer’s Guide, Insightful Corporation,
Seattle, WA.
Printed in the United States.
ii
ACKNOWLEDGMENTS
S-PLUS would not exist without the pioneering research of the Bell
Labs S team at AT&T (now Lucent Technologies): John Chambers,
Richard A. Becker (now at AT&T Laboratories), Allan R. Wilks (now
at AT&T Laboratories), Duncan Temple Lang, and their colleagues in
the statistics research departments at Lucent: William S. Cleveland,
Trevor Hastie (now at Stanford University), Linda Clark, Anne
Freeny, Eric Grosse, David James, José Pinheiro, Daryl Pregibon, and
Ming Shyu.
Insightful Corporation thanks the following individuals for their
contributions to this and earlier releases of S-PLUS: Douglas M. Bates,
Leo Breiman, Dan Carr, Steve Dubnoff, Don Edwards, Jerome
Friedman, Kevin Goodman, Perry Haaland, David Hardesty, Frank
Harrell, Richard Heiberger, Mia Hubert, Richard Jones, Jennifer
Lasecki, W.Q. Meeker, Adrian Raftery, Brian Ripley, Peter
Rousseeuw, J.D. Spurrier, Anja Struyf, Terry Therneau, Rob
Tibshirani, Katrien Van Driessen, William Venables, and Judy Zeh.
iii
iv
CONTENTS OVERVIEW
CONTENTS OVERVIEW
Graphics
v
CONTENTS OVERIVEW
Advanced Topics
Index 969
vi
CONTENTS
vii
CONTENTS
viii
CONTENTS
ix
CONTENTS
x
CONTENTS
Chapter 18 The S-PLUS Command Line and the System Interface 833
Using the Command Line 834
Command Line Parsing 837
Working With Projects 852
Enhancing S-PLUS 854
The System Interface 856
xi
CONTENTS
Index 969
xii
THE S-PLUS LANGUAGE
Introduction to S-PLUS
1 2
Interpreted vs. Compiled Languages 3
Object-Oriented Programming 3
Versions of the S Language 4
Programming Tools in S-PLUS 5
Syntax of S-PLUS Expressions 7
Names and Assignment 8
Subscripting 9
Data Classes 11
The S-PLUS Programming Environment 14
Editing Objects 14
Functions and Scripts 14
Transferring Data Objects 15
Graphics Paradigms 17
Editable Graphics 17
Traditional Graphics 17
Traditional Trellis Graphics 17
Converting Non-editable Graphics to Editable Graphics 17
When to Use Each Graphics System 18
1
Chapter 1 The S-PLUS Language
INTRODUCTION TO S-PLUS
S-PLUS is a language specially created for exploratory data analysis
and statistics. You can use S-PLUS productively and effectively without
even writing a one-line program in the S-PLUS language. However,
most users begin programming in S-PLUS almost subconsciously—
defining functions to streamline repetitive computations, avoid typing
mistakes in multi-line expressions, or simply to keep a record of a
sequence of commands for future use. The next step is usually
incorporating flow-of-control features to reduce repetition in these
simple functions. From there it is a relatively short step to the creation
of entirely new modules of S-PLUS functions, perhaps building on the
object-oriented features that allow you to define new classes of objects
and methods to handle them properly.
In this book, we concentrate on describing how to use the language.
As with any good book on programming, the goal of this book is to
help you quickly produce useful S-PLUS functions, and then step back
and delve more deeply into the internals of the S-PLUS language.
Along the way, we will continually touch on those aspects of S-PLUS
programming that are either particularly effective (such as vectorized
arithmetic) or particularly troubling (memory use, for loops).
This chapter aims to familiarize you with the language, starting with a
comparison of interpreted and compiled languages. We then briefly
describe object-oriented programming as it relates to S-PLUS,
although a full discussion is deferred until Chapter 10, Object-
Oriented Programming in S-PLUS. We then describe the basic syntax
and data types in S-PLUS. Programming in S-PLUS does not require,
but greatly benefits from, programming tools such as text editors and
source control. We touch on these tools briefly in the section The
S-PLUS Programming Environment (page 14). Finally, we introduce
the various graphics paradigms, and discuss when each should be
used.
Note
This book is intended for use with the S-PLUS Professional Edition. The full functionality of the
S-PLUS language, described in these pages, is not available to Axum or S-PLUS Standard Edition
users.
2
Introduction to S-PLUS
3
Chapter 1 The S-PLUS Language
on objects of that type. You then define the actions specifically for that
type of object. Typically, the first such action is to create instances of
the type.
Suppose, for example, that you start thinking about some graphical
objects, more specifically, circles on the computer screen. You want to
be able to create circles, but you also want to be able to draw them,
redraw them, move them, and so on.
Using the object-oriented approach to programming, you would
define a class of objects called circle, then define a function for
generating circles. (Such functions are called generator functions.) What
about drawing, redrawing, and moving? All of these are actions that
may be performed on a wide variety of objects, but may well need to
be implemented differently for each. An object-oriented approach,
therefore, defines the actions generically, with generic functions called
draw, redraw, move, and so on.
Versions of the There are currently two distinct versions of the S language in
S Language common use: the S Version 3 language that underlies S-PLUS 2000 for
Windows (and all earlier versions of S-PLUS for Windows, as well as
UNIX versions of S-PLUS from 3.0 to 3.4) and the S Version 4
language that underlies S-PLUS 5.0 and later on UNIX and S-PLUS 6
for Windows and later.
The S Version 3 language (referred to in this document as SV3)
introduced the modeling language that is the foundation for most
S-PLUS statistical and analytic functionality. It had a simple object-
oriented structure with a dispatch mechanism built on naming
conventions. It did not apply any class structure to existing S-PLUS
objects such as vectors and matrices.
4
Introduction to S-PLUS
Programming There are two main tools for developing S-PLUS programs: the
Tools in S-PLUS Commands window and Script windows. The Commands window
will be familiar to all users of S-PLUS prior to version 4. Only one
Commands window can be open, and the easiest way to do this is
simply click on its Standard toolbar button.
Figure 1.1: The Commands window button, found on the Standard toolbar.
> plot(corn.rain)
If you type in examples from the text, or cut and paste examples from
the on-line manuals, be sure to omit the prompt character. To exit the
Commands window, simply use the close window tool on the top
right of the window. The command
> q()
5
Chapter 1 The S-PLUS Language
6
Syntax of S-PLUS Expressions
> sqrt
function(x)
.Call("S_c_use_method", "sqrt")
Note
This definition applies to syntactic names, that is, names recognized by the S-PLUS interpreter as
names. S-PLUS provides a mechanism by which virtually any character string, including non-
syntactic names, can be supplied as the name of the data object. This mechanism is described in
Chapter 20, Data Management.
> plot(corn.rain)
> mean(corn.rain)
[1] 10.78421
7
Chapter 1 The S-PLUS Language
> 2 + 7
[1] 9
> 12.4 / 3
[1] 4.133333
Names and One of the most frequently used infix operators is the assignment
Assignment operator <- (and its equivalents, the equal sign, =, and the
underscore, _) used to associate names and values. For example, the
expression
associates the value 7 with the name aba. The value of an assignment
expression is the assigned value, that is, the value on the right side of
the assignment arrow. Assignment suppresses automatic printing, but
you can use the print function to force S-PLUS to print the
expression’s value as follows:
> aba
[1] 7
The value on the right of the assignment arrow can be any S-PLUS
expression; the left side can be any syntactic name or character string.
1
There are a few reserved names, such as if and function.
Assignments typed at the S-PLUS prompt are permanent; objects
created in this way endure from session to session, until removed.
1. The complete list is as follows: if, is, else, for, while, repeat,
next, break, in, function, return, TRUE, T, FALSE, F, NULL, NA,
Inf, NaN.
8
Syntax of S-PLUS Expressions
> letters[3]
[1] "c"
> letters[-3]
[1] "a" "b" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n"
[14] "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "y" "z"
9
Chapter 1 The S-PLUS Language
10
Data Classes
DATA CLASSES
Everything in S-PLUS is an object. All objects have a class. An S-PLUS
expression (itself an object) is interpreted by the S-PLUS evaluator and
returns a value, another object that can be assigned a name. An
object’s class determines the representation of the object, that is, what
types of information can be found within the object, and where that
information can be found. Most information about an object is
contained within specialized structures called slots.
The simplest data objects are one-dimensional arrays called vectors,
consisting of any number of elements corresponding to individual data
points. The simplest elements are literal expressions that, either singly
or matched like-with-like, produce the following classes:
• logical: The values T (or TRUE) and F (or FALSE).
• integer: Integer values such as 3 or -4.
• numeric: Floating-point real numbers (double-precision by
default). Numerical values can be written as whole numbers
(for example, 3., -4.), decimal fractions (4.52, -6.003), or in
scientific notation (6.02e23, 8e-47).
• complex: Complex numbers of the form a + bi, where a and
b are integers or numeric (for example, 3 + 1.23i).
> 7.4
[1] 7.4
> "hello"
[1] "hello"
11
Chapter 1 The S-PLUS Language
> c(T,F,T)
[1] T F T
> c(8.3, 9.2, 11)
[1] 8.3 9.2 11.0
You can obtain the class and length of any data object using the class
and length functions, respectively:
The most generally useful of the recursive data types is the list
function, which can be used to combine arbitrary collections of
S-PLUS data objects into a single object. For example, suppose you
have a vector x of character data, a matrix y of logical data, and a
time series z as shown below:
You can combine these into a single S-PLUS data object (of class
"list") using the list function:
2. This statement about coercion applies strictly only to the five simple
classes described on page 11. These simple classes correspond
roughly to what S version 3 and earlier referred to as modes.
(Although objects of class "integer" have mode "numeric".) The
concept of modes persists in S version 4, but it has been almost
entirely superseded by the new class mechanism.
12
Data Classes
$y:
[,1] [,2]
[1,] T T
[2,] F F
$z:
1989: 0.841470985 0.909297427 0.141120008 -0.756802495
1993: -0.958924275 -0.279415498 0.656986599 0.989358247
1997: 0.412118485 -0.544021111 -0.999990207 -0.536572918
2001: 0.420167037 0.990607356 0.650287840 -0.287903317
2005: -0.961397492 -0.750987247 0.149877210 0.912945251
2009: 0.836655639 -0.008851309 -0.846220404 -0.905578362
2013: -0.132351750 0.762558450 0.956375928 0.270905788
2017: -0.663633884 -0.988031624 -0.404037645 0.551426681
2021: 0.999911860 0.529082686 -0.428182669 -0.991778853
13
Chapter 1 The S-PLUS Language
Editing Objects You can edit S-PLUS data by using the fix function.
> fix(x)
The fix function uses an editor you specify with the S-PLUS editor
option. At the S-PLUS prompt, type the following:
where editor is the binary executable (.exe) that runs your favorite text
editor. To set this option for each S-PLUS session, add the expression
to your .First function. This option defaults to Notepad in S-PLUS.
Once you’ve set up S-PLUS to work with your favorite editor, writing
and testing new functions requires following the simple sequence of
writing the function, running the function, editing the function, and so
on.
14
The S-PLUS Programming Environment
15
Chapter 1 The S-PLUS Language
16
Graphics Paradigms
GRAPHICS PARADIGMS
In S-PLUS there are three basic graphics paradigms, which we will
refer to as Editable Graphics, Traditional Graphics, and Traditional
Trellis Graphics.
17
Chapter 1 The S-PLUS Language
> graphsheet(object.mode="object-oriented").
18
Graphics Paradigms
Editable Graphics Editable graphics are new to S-PLUS version 4. They have been
developed based on modern C++ object-oriented programming
structures. As such they are based on a model of creating an object of
a particular class with properties containing a description of the
object. The user edits the object by modifying its properties. Multiple
graphics objects form an object hierarchy of plots within graphs
within Graph sheets which together represent a graphic.
Programmers used to using this type of object-oriented programming
will prefer to program by creating and modifying editable graphics
objects. Users of previous versions of S-PLUS may want to transition
towards using editable graphics when doing so provides benefits not
available with the traditional graphics, and continue to use traditional
graphics when they can leverage their existing experience to get
superior results.
19
Chapter 1 The S-PLUS Language
20
DATA OBJECTS
Introduction
2
22
Vectors 23
Coercion of Values 23
Creating Vectors 24
Naming Vector Elements 26
Structures 28
Matrices 28
Arrays 31
Lists 34
Creating Lists 35
Naming Components 36
Factors and Ordered Factors 37
Creating Factors 38
Creating Ordered Factors 40
Creating Factors From Continuous Data 41
21
Chapter 2 Data Objects
INTRODUCTION
When using S-PLUS, you should think of your data sets as data objects
belonging to a certain class. Each class has a particular representation,
often defined as a named list of slots. Each slot, in turn, contains an
object of some other class.
The class of an object defines how the object is represented and
determines what actions may be performed on the object and how
those actions are performed. Among the most common classes of data
objects are numeric, character, factor, list, and data.frame.
The simplest type of data object in S-PLUS is the atomic vector, a one-
way array of n elements of a single mode (for example, numbers) that
can be indexed numerically. Atomic vectors are so called to indicate
that in S-PLUS they are indeed fundamental objects. All of S-PLUS’s
basic mathematical operations and data manipulation functions are
designed to work on the vector as a whole, although individual
elements of the vector can be extracted using their numerical indices.
More complicated data objects can be constructed from atomic
vectors in one of two basic ways:
1. By allowing complete S objects as elements, or
2. By building new data classes from old using slots
Objects that contain other S objects as elements are called recursive
objects and include such common S-PLUS objects as lists and data
frames. A list is a vector for which each element is a distinct S object,
of any type. A data frame is essentially a list in which each of the
elements is an atomic vector, and all of the elements have the same
length. With slots, you can uniquely define a new class of data object
by storing the defining information (that is, the object’s attributes) in
one or more slots.
Data objects can contain not only logical, numeric, complex, and
character values, but also functions, operators, function calls, and
evaluations. All the different types (classes) of S-PLUS objects can be
manipulated in the same way: saved, assigned, edited, combined, or
passed as arguments to functions. This general definition of data
objects, coupled with class-specific methods, forms the backbone of
object-oriented programming and provides exceptional flexibility in
extending the capabilities of S-PLUS.
22
Vectors
VECTORS
The simplest type of data object in S-PLUS is a vector. A vector is
simply an ordered set of values. The order of the values is emphasized
because ordering provides a convenient way of extracting the parts of
a vector. To extract individual elements, use their numerical indices
with the subscript operator [:
> car.gals[c(1,3,5)]
[1] 13.3 11.5 14.3
All elements within an atomic vector must be from only one of seven
atomic modes—logical, numeric, single, integer, complex, raw, or
character. (An eighth atomic mode, NULL, applies only to the NULL
vector.) The number of elements and their mode completely define
the data object as a vector. The class of any vector is the mode of its
elements:
> class(c(T,T,F,T))
[1] "logical"
> class(c(1,2,3,4))
[1] "integer"
> class(c(1.24,3.45, pi))
[1] "numeric"
> length(1:10)
[1] 10
Coercion of When values of different modes are combined into a single atomic
Values object, S-PLUS converts, or coerces, all values to a single mode in a way
that preserves as much information as possible. The basic modes can
be arranged in order of increasing information—logical, integer,
numeric, complex, and character. Thus, mixed values are all
converted to the mode of the value with the most informative mode.
For example, suppose we combine a logical value, a numeric value,
and a character value, as follows:
23
Chapter 2 Data Objects
S-PLUS coerces all three values to mode character because this is the
most informative mode represented. Similarly, in the following
example, all the values are coerced to mode numeric:
When logical values are coerced to integers, TRUE values become the
integer 1 and FALSE values become the integer 0.
The same kind of coercion occurs when values of different modes are
combined in computations. For example, logical values are coerced
to zeros and ones in integer or numeric computations.
> rep(NA,5)
[1] NA NA NA NA NA
> rep(c(T,T,F),2)
[1] T T F T T F
If times is a vector with the same length as the vector of values being
repeated, each value is repeated the corresponding number of times.
> rep(c("yes","no"),c(4,2))
[1] "yes" "yes" "yes" "yes" "no" "no"
> 1:5
[1] 1 2 3 4 5
> 1.2:4
[1] 1.2 2.2 3.2
> 1:-1
[1] 1 0 -1
24
Vectors
> seq(-pi,pi,.5)
[1] -3.1415927 -2.6415927 -2.1415927 -1.6415927 -1.1415927
[6] -0.6415927 -0.1415927 0.3584073 0.8584073 1.3584073
[11] 1.8584073 2.3584073 2.8584073
You can specify the length of the vector and seq computes the
increment:
> seq(-pi,pi,length=10)
[1] -3.1415927 -2.4434610 -1.7453293 -1.0471976 -0.3490659
[6] 0.3490659 1.0471976 1.7453293 2.4434610 3.1415927
Or, you can specify the beginning, the increment, and the length with
either the length argument or the along argument:
> seq(1,by=.05,length=10)
[1] 1.00 1.05 1.10 1.15 1.20 1.25 1.30 1.35 1.40 1.45
> seq(1,by=.05,along=1:5)
[1] 1.00 1.05 1.10 1.15 1.20
See the help file for seq for more information on the length and
along arguments.
> vector("logical",3)
[1] F F F
25
Chapter 2 Data Objects
Naming Vector You can assign names to vector elements to associate specific
Elements information, such as case labels or value identifiers, with each value of
the vector. To create a vector with named values, you assign the
names with the names function:
26
Vectors
27
Chapter 2 Data Objects
STRUCTURES
Next in complexity after the atomic vectors are the structures, which,
as the name implies, extend vectors by imposing a structure, typically
a multi-dimensional array, upon the data.
The simplest structure is the two-dimensional matrix. A matrix starts
with a vector and then adds the information about how many rows
and columns the matrix contains. This information, the dimension, or
dim, of the matrix, is stored in a slot in the representation of the
matrix class. All structure classes have at least one slot, .Data, which
must contain a vector. The classes matrix and array have one
additional required slot, .Dim, to hold the dimension and one optional
slot, .Dimnames, to hold the names for the rows and columns of a
matrix and their analogues for higher dimensional arrays. Like simple
vectors, structure objects are atomic, that is, all of their values must be
of a single mode.
Creating Matrices To create a matrix from an existing vector, use the function to set
dim
the .Dim slot. To use dim, you assign a vector of two integers
specifying the number of rows and columns. For example:
28
Structures
[2,] 1 2 3 4
[3,] 1 2 3 4
> rbind(c(200688,24,33),c(201083,27,115))
[,1][,2][,3]
[1,] 200688 24 33
[2,] 201083 27 115
29
Chapter 2 Data Objects
> matrix(1:12,ncol=3,byrow=T)
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
[4,] 10 11 12
Naming Rows and For a vector you saw that you could assign names to each value with
Columns the names function. For matrices, you can assign names to the rows
and columns with the dimnames function. To create a matrix with row
and column names of your own, create a list with two components,
one for rows and one for columns, and assign them using the
dimnames function.
To suppress either row or column labels, use the NULL value for the
corresponding component of the list. For example, to suppress the
row labels and number the columns:
30
Structures
To specify the row and column labels when defining a matrix with
matrix, use the optional argument dimnames as follows:
Arrays Arrays generalize matrices by extending the .Dim slot to more than
two dimensions. If the rows and columns of a matrix are the length
and width of a rectangular arrangement of equal-sized cubes, then
length, width, and height represent the dimensions of a three-way
array. You can visualize a series of equal-sized rectangles or cubes
stacked one on top of the other to form a three-dimensional box. The
box is composed of cells (the individual cubes) and each cell is
specified by its position along the length, width, and height of the
box.
An example of a three-dimensional array is the iris data set in
S-PLUS. The first two cases are presented here:
> iris[1:2,,]
, , Setosa
Sepal L. Sepal W. Petal L. Petal W.
[1,] 5.1 3.5 1.4 0.2
[2,] 4.9 3.0 1.4 0.2
, , Versicolor
Sepal L. Sepal W. Petal L. Petal W.
[1,] 7.0 3.2 4.7 1.4
[2,] 6.4 3.2 4.5 1.5
, , Virginica
Sepal L. Sepal W. Petal L. Petal W.
[1,] 6.3 3.3 6.0 2.5
[2,] 5.8 2.7 5.1 1.9
The data present 50 observations of sepal length and width and petal
length and width for each of three species of iris (Setosa, Versicolor,
and Virginica). The .Dim slot of iris represents the length, width, and
height in the box analogy:
31
Chapter 2 Data Objects
> dim(iris)
[1] 50 4 3
Creating Arrays To create an array in S-PLUS, use the array function. The array
function is analogous to matrix. It takes data and the appropriate
dimensions as arguments to produce the array. If no data are
supplied, the array is filled with NAs.
When passing values to array, combine them in a vector so that the
first dimension varies fastest, the second dimension the next fastest,
and so on. The following example shows how this works:
> array(c(1:8,11:18,111:118),dim=c(2,4,3))
, , 1
[,1][,2][,3][,4]
[1,] 1 3 5 7
[2,] 2 4 6 8
, , 2
[,1][,2][,3][,4]
[1,] 11 13 15 17
[2,] 12 14 16 18
, , 3
[,1][,2][,3][,4]
[1,] 111 113 115 117
[2,] 112 114 116 118
32
Structures
> vec
[1] 1 2 3 4 5 6 7 8 11 12 13
[12] 14 15 16 17 18 111 112 113 114 115 116
[23] 117 118
> dim(vec) <- c(2,4,3)
33
Chapter 2 Data Objects
LISTS
A list is a completely flexible means for representing data. In earlier
versions of S, it was the standard means of combining arbitrary
objects into a single data object. Much the same effect can be created,
however, using the notion of slots.
Up to this point, all the data objects described have been atomic,
meaning they contain data of only one mode. Often, however, you
need to create objects that not only contain data of mixed modes but
also preserve the mode of each value.
For example, the slots of an array may contain both the dimension (a
numeric vector) and the .Dimnames slot (a character vector), and it is
important to preserve those modes:
> attributes(iris)
$dim:
[1] 50 4 3
$dimnames:
$dimnames[[1]]:
character(0)
$dimnames[[2]]:
[1] "Sepal L." "Sepal W." "Petal L." "Petal W."
$dimnames[[3]]:
[1] "Setosa" "Versicolor" "Virginica"
34
Lists
Creating Lists To create a list, use the list function. Each argument to list defines
a component of the list. Naming an argument, using the form
name=component, creates a name for the corresponding component.
For example, you can create a list from the two vectors grp and thw as
follows:
$thw:
[1] 450 760 325 495 285 450 460 375 310 615 425 245 350
[14] 340 300 310 270 300 360 405 290
$descrip:
[1] "heart data"
> heart.list$group
[1] 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2
35
Chapter 2 Data Objects
> heart.list[[1]]
[1] 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2
> heart.list[[1]][11:12]
[1] 1 2
or
> heart.list$group[11:12]
[1] 1 2
36
Factors and Ordered Factors
> fuel.frame$Type
[1] Small Small Small Small Small Small Small
[8] Small Small Small Small Small Small Sporty
[15] Sporty Sporty Sporty Sporty Sporty Sporty Sporty
[22] Sporty Compact Compact Compact Compact Compact Compact
[29] Compact Compact Compact Compact Compact Compact Compact
[36] Compact Compact Medium Medium Medium Medium Medium
[43] Medium Medium Medium Medium Medium Medium Medium
[50] Medium Large Large Large Van Van Van
[57] Van Van Van Van
When you print a factor, the values correspond to the level of the
factor for each data point or observation. Internally, a factor keeps
track of the levels or different categorical values contained in the data
and indices that point to the appropriate level for each data point.
The different levels of a factor are stored in an attribute called levels.
Factor objects are a natural form for categorical data in an object-
oriented programming environment because they have a class
attribute that allows specific method functions to be developed for
37
Chapter 2 Data Objects
them. For example, the generic print function uses the print.factor
method to print factors. If you override print.factor by calling
print.default, you can see how a factor is stored internally.
> print.default(fuel.frame$Type)
[1] 4 4 4 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 1 1 1
[26] 1 1 1 1 1 1 1 1 1 1 1 1 3 3 3 3 3 3 3 3 3 3 3 3 3
[51] 2 2 2 6 6 6 6 6 6 6
attr(, "levels"):
[1] "Compact" "Large" "Medium" "Small" "Sporty" "Van"
attr(, "class"):
[1] "factor"
The integers serve as indices to the values in the levels attribute. You
can return the integer indices directly with the codes function.
> codes(fuel.frame$Type)
[1] 4 4 4 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5 5 1 1 1
[26] 1 1 1 1 1 1 1 1 1 1 1 1 3 3 3 3 3 3 3 3 3 3 3 3 3
[51] 2 2 2 6 6 6 6 6 6 6
Or, you can examine the levels of a factor with the levels function.
> levels(fuel.frame$Type)
[1] "Compact" "Large" "Medium" "Small" "Sporty" "Van"
> levels(fuel.frame$Type)[codes(fuel.frame$Type)]
except that the quotes are dropped. To get the number of cases of
each level in a factor, call summary:
> summary(fuel.frame$Type)
Compact Large Medium Small Sporty Van
15 3 13 13 9 7
Creating To create a factor, use the factor function. The factor function takes
Factors data with categorical values and creates a data object of class factor.
For example, you can categorize a group of 10 students by gender as
follows:
38
Factors and Ordered Factors
> factor(classlist)
[1] male female male male male female female male
[9] female male
S-PLUS creates two levels with labels female and male, respectively.
Table 2.2: Arguments to factor.
Argument Description
The levels argument allows you to specify the levels you want to use
or to order them the way you want. For example, if you want to
include certain categories in an analysis, you can specify them with
the levels argument. Any values omitted from the levels argument
are considered missing.
If you had left the levels argument off, the levels would have been
ordered alphabetically as Hi, Low, Medium. You use the labels
argument if you want the levels to be something other than the
original data.
39
Chapter 2 Data Objects
> factor(c("Hi","Lo","Med","Hi","Hi","Lo"),
+ levels=c("Lo","Hi"), labels = c("LowDose","HighDose"))
[1] HighDose LowDose NA HighDose HighDose LowDose
Warning
If you provide the levels and labels arguments, then you must order them in the same way. If
you don’t provide the levels argument but do provide the labels argument, then you must
order the labels the same way S-PLUS orders the levels of the factor, which is alphabetically for
character strings and numerically for a numeric vector that is converted to a factor.
> factor(c("Hi","Med","Lo","Hi","Hi","Lo"),
+ exclude =c("Med"))
[1] Hi NA Lo Hi Hi Lo
Creating If the order of the levels of a factor is important, you can represent the
Ordered data as a special type of factor called an ordered factor. Use the ordered
function to create ordered factors. The arguments to ordered are the
Factors same as those to factor. To create an ordered version of the intensity
factor, do:
> ordered(c("Hi","Med","Lo","Hi","Hi","Lo"),
+ levels=c("Lo","Med","Hi"))
[1] Hi Med Lo Hi Hi Lo
Lo < Med < Hi
Warning
If you don’t provide a levels argument, an ordering will be placed on the levels corresponding
to the default ordering of the levels by S-PLUS.
40
Factors and Ordered Factors
Creating To create categorical data out of numerical or continuous data, use the
Factors From cut function. You provide either a vector of specific breakpoints or an
integer specifying how many groups to divide the numerical data
Continuous into; cut then creates levels corresponding to the specified ranges. All
Data the values falling in any particular range are assigned the same level.
For example, the murder rates in the 50 states can be grouped into
High and Low values using cut:
> cut(state.x77[,"Murder"],breaks=c(0,8,16))
[1] 2 2 1 2 2 1 1 1 2 2 1 1 2 1 1 1 2 2 1 2 1 2 1 2 2
[26] 1 1 2 1 1 2 2 2 1 1 1 1 1 1 2 1 2 2 1 1 2 1 1 1 1
attr(, "levels"):
[1] " 0+ thru 8" "8+ thru 16"
> cut(state.x77[,"Murder"],c(0,8,16),
+ labels=c("Low","High"))
[1] 2 2 1 2 2 1 1 1 2 2 1 1 2 1 1 1 2 2 1 2 1 2 1 2 2
[26] 1 1 2 1 1 2 2 2 1 1 1 1 1 1 2 1 2 2 1 1 2 1 1 1 1
attr(, "levels"):
[1] "Low" "High"
41
Chapter 2 Data Objects
Note
As you may notice from the style of printing in the above examples, cut does not produce factors
directly. Rather, the value returned by cut is a category object.
To create a factor from the output of cut, just call factor with the call
to cut as its only argument:
42
DATA FRAMES
Introduction
3
44
The Benefits of Data Frames 45
Creating Data Frames 46
Rectangular Data Functions 50
Combining Data Frames 52
Combining Data Frames by Column 52
Combining Data Frames by Row 54
Merging Data Frames 56
Converting Data Frames 58
Applying Functions to Subsets of a Data Frame 59
Adding New Classes of Variables to Data Frames 65
Data Frame Attributes 68
43
Chapter 3 Data Frames
INTRODUCTION
Data frames are data objects designed primarily for data analysis and
modeling. You can think of them as generalized matrices—generalized
in a way different from the way arrays generalize matrices. Arrays
generalize the dimensional aspect of a matrix; data frames generalize
the mode aspect of a matrix. Matrices can be of only one mode (for
example, "logical", "numeric", "complex", "character"). Data
frames, however, allow you to mix modes from column to column.
For example, you could have a column of "character" values, a
column of "numeric" values, a column of categorical values, and a
column of "logical" values. Each column of a data frame
corresponds to a particular variable; each row corresponds to a single
“case” or set of observations.
44
The Benefits of Data Frames
45
Chapter 3 Data Frames
46
Creating Data Frames
47
Chapter 3 Data Frames
11 0.593684564 present 82 5
12 0.291224646 absent 148 3
13 -0.162832145 absent 18 5
14 0.248051730 absent 1 4
16 -0.957828145 absent 168 3
17 0.051553058 absent 1 3
18 -0.294367576 absent 78 6
19 -0.001231745 absent 175 5
20 -0.225155320 absent 80 5
21 -0.192293286 absent 27 4
The names of the objects are used for the variable names in the data
frame. Row names for the data frame are obtained from the first
object with a names, dimnames, or row.names attribute having unique
values. In the above example, the object was my.df:
> my.df
Kyphosis Age Number
1 absent 71 3
2 absent 158 3
3 present 128 4
4 absent 2 5
5 absent 1 4
6 absent 1 2
7 absent 61 2
8 absent 37 3
9 absent 113 2
10 present 59 6
11 present 82 5
12 absent 148 3
13 absent 18 5
14 absent 1 4
16 absent 168 3
17 absent 1 3
18 absent 78 6
19 absent 175 5
20 absent 80 5
21 absent 27 4
The row names are not just the row numbers—in our subset, the
number 15 is missing. The fifteenth row of kyphosis, and hence
my.df, has the row name "16".
48
Creating Data Frames
The attributes of special types of vectors (such as factors) are not lost
when they are combined in a data frame. They can be retrieved by
asking for the attributes of the particular variable of interest. More
detail is given in the section Data Frame Attributes (page 68).
Each vector adds one variable to the data frame. Matrices and data
frames provide as many variables to the new data frame as they have
columns or variables, respectively. Lists, because they can be built
from virtually any data object, are more complicated—they provide as
many variables as all of their components taken together.
When combining objects of different types into a data frame, some
objects may be altered somewhat to be more suitable for further
analysis. For example, numeric vectors and factors remain unchanged
in the data frame. Character and logical vectors, however, are
converted to factors before being included in the data frame. The
conversion is done because S-PLUS assumes that character and logical
data will most commonly be taken to be a categorical variable in any
modeling that is to follow. If you want to keep a character or logical
vector “as is” in the data frame, pass the vector to data.frame
wrapped in a call to the I function, which returns the vector
unchanged but with the added class "AsIs".
For example, consider the following logical vector, my.logical:
> my.logical
[1] T T T T T F T T F T T F T F T T T T T T
49
Chapter 3 Data Frames
11 -0.9127547 T
12 0.1771526 F
13 0.5361920 T
14 0.3633339 F
15 0.5164660 T
16 0.4362987 T
17 -1.2920592 T
18 0.8314435 T
19 -0.6188006 T
20 1.4910625 T
> mode(my.df$b)
[1] "logical"
> data.frame(price,country,reliab,mileage,type,
+ row.names=c("Acura","Audi","BMW","Chev","Ford",
+ "Mazda","MazdaMX","Nissan","Olds","Toyota"))
price country reliab mileage type
Acura 11950 Japan 5 NA Small
Audi 26900 Germany NA NA Medium
. . .
Rectangular Rectangular data functions allow you to access all rectangular data
Data Functions objects in the same way. Rectangular data objects include matrices,
data frames, and atomic vectors which have the form of rows
(observations) and one or more columns (variables).
There are eight rectangular data functions you can use:
• as.rectangular converts any object to a rectangular data
object (generally a data frame).
• as.char.rect takes a rectangular object and returns a
rectangular object consisting of character strings, suitable for
printing (but not formatted to fixed width).
• is.rectangular tests whether an object is rectangular.
• sub is used for subscripting.
50
Creating Data Frames
help(function)
51
Chapter 3 Data Frames
3. Merging (or joining) data frames. This case arises when you
have two data frames containing some information in
common, and you want to get as much information as
possible from both data frames about the overlapping cases.
For this case, use the merge function.
All three of the functions mentioned above ( cbind, rbind, and merge)
have methods for data frames, but in the usual cases, you can simply
call the generic function and obtain the correct result.
Combining Suppose you have a data frame consisting of factor variables defining
Data Frames an experimental design. When the experiment is complete, you can
add the vector of observed responses as another variable in the data
by Column frame. In this case, you are simply adding another column to the
existing data frame, and the natural tool for this in S-PLUS is the cbind
function. For example, consider the simple built-in design matrix
oa.4.2p3, representing a half-fraction of a 2^4 design.
> oa.4.2p3
A B C
1 A1 B1 C1
2 A1 B2 C2
3 A2 B1 C2
52
Combining Data Frames
4 A2 B2 C1
> common.names
[1] "Acura Integra" "Acura Legend"
[3] "Audi 100" "Audi 80"
[5] "BMW 325i" "BMW 535i"
[7] "Buick Century" "Buick Electra"
. . .
53
Chapter 3 Data Frames
Combining Suppose you are pooling the data from several research studies. You
Data Frames have data frames with observations of equivalent, or roughly
equivalent, variables for several sets of subjects. Renaming variables
by Row as necessary, you can subscript the data sets to obtain new data sets
having a common set of variables. You can then use rbind to obtain a
new data frame containing all the observations from the studies.
For example, consider the following data frames.
54
Combining Data Frames
7 0.07429523 0.53649764 43
8 -0.80310861 0.06334192 38
9 0.47110022 0.24843933 44
10 -1.70465453 0.78770638 45
> rand.df2 <-
data.frame(norm=rnorm(20),binom=rbinom(20,10,0.5),
chisq=rchisq(20,10))
> rand.df2
norm binom chisq
1 0.3485193 50 19.359238
2 1.6454204 41 13.547288
3 1.4330907 53 4.968438
4 -0.8531461 55 4.458559
5 0.8741626 47 2.589351
These data frames have the common variables norm and binom; we
subscript and combine the resulting data frames as follows.
> rbind(rand.df1[,c("norm","binom")],
+ rand.df2[,c("norm", "binom")])
norm binom
1 1.64542042 41
2 1.64542042 44
3 -0.13593118 53
4 0.26271524 34
5 -0.01900051 47
6 0.14986005 41
7 0.07429523 43
8 -0.80310861 38
9 0.47110022 44
10 -1.70465453 45
11 0.34851926 50
12 1.64542042 41
13 1.43309068 53
14 -0.85314606 55
15 0.87416262 47
55
Chapter 3 Data Frames
Warning
Use rbind (and, in particular, rbind.data.frame) only when you have complete data frames, as
in the above example. Do not use it in a loop to add one row at a time to an existing data frame—
this is very inefficient. To build a data frame, write all the observations to a data file and use
read.table to read it in.
You can get basic statistics on individual rows by running any of the
four following functions in S-PLUS:
• rowMeans
• rowSums
• rowVars
• rowStdevs
Merging Data In many situations, you may have data from multiple sources with
Frames some duplicated data. To get the cleanest possible data set for
analysis, you want to merge or join the data before proceeding with the
analysis. For example, player statistics extracted from Total Baseball
overlap somewhat with player statistics extracted from The Baseball
Encyclopedia. You can use the merge function to join two data frames
by their common data. For example, consider the following made-up
data sets.
> baseball.off
player years.ML BA HR
1 Whitehead 4 0.308 10
2 Jones 3 0.235 11
3 Smith 5 0.207 4
4 Russell NA 0.270 19
5 Ayer 7 0.283 5
> baseball.def
player years.ML A FA
1 Smith 5 300 0.974
2 Jones 3 7 0.990
3 Whitehead 4 9 0.980
4 Russell NA 55 0.963
56
Combining Data Frames
These can be merged by the two columns they have in common using
merge:
> authors
FirstName LastName Age Income Home
1 Lorne Green 82 1200000 California
2 Loren Blye 40 40000 Washington
3 Robin Green 45 25000 Washington
4 Robin Howe 2 0 Alberta
5 Billy Jaye 40 27500 Washington
> books
AuthorFirstName AuthorLastName Book
1 Lorne Green Bonanza
2 Loren Blye Midwifery
3 Loren Blye Gardening
4 Loren Blye Perennials
5 Robin Green Who_dun_it?
6 Rich Calaway Splus
57
Chapter 3 Data Frames
Because the desired “by” columns are in the same position in both
books and authors, we can accomplish the same result more simply
as follows.
Converting You may want to convert an S-PLUS data frame to a matrix. If so,
Data Frames there are three different functions which take a data frame as an
argument and return a matrix whose elements correspond to the
elements of the data frame:
• as.matrix.data.frame
• numerical.matrix
• data.matrix
58
Applying Functions to Subsets of a Data Frame
59
Chapter 3 Data Frames
Warning
For most numeric summaries, all variables in the data frame must be numeric. Thus, if we
attempt to repeat the above example with the kyphosis data, using kyphosis as the by variable,
we get an error:
For time series, aggregate returns a new, shorter time series that
summarizes the values in the time interval given by a new frequency.
For instance you can quickly extract the yearly maximum, minimum,
and average from the monthly housing start data in the time series
hstart:
60
Applying Functions to Subsets of a Data Frame
The applied function supplied as the FUN argument must accept a data
frame as its first argument; if you want to apply a function that does
not naturally accept a data frame as its first argument, you must
define a function that does so on the fly. For example, one common
application of the by function is to repeat model fitting for each level
or combination of levels; the modeling functions, however, generally
have a formula as their first argument. The following call to by shows
how to define the FUN argument to fit a linear model to each level:
Coefficients:
(Intercept) Start
4.885736 -0.08764492
Degrees of freedom: 39 total; 37 residual
Residual standard error: 1.261852
61
Chapter 3 Data Frames
Kyphosis:present
Older:FALSE
Call:
lm(formula = Number~Start, data = data)
Coefficients:
(Intercept) Start
6.371257 -0.1191617
Degrees of freedom: 9 total; 7 residual
Residual standard error: 1.170313
Kyphosis:absent
Older:TRUE
. . .
Warning
Again, as with aggregate, you need to be careful that the function you are applying by to works
with data frames, and often you need to be careful that it works with factors as well. For example,
consider the following two examples.
kyphosis$Kyphosis:present
Kyphosis Age Number Start
NA 97.82353 5.176471 7.294118
Warning messages:
1: 64 missing values generated coercing from character to
numeric in: as.double(x)
2: 17 missing values generated coercing from character to
numeric in: as.double(x)
62
Applying Functions to Subsets of a Data Frame
The functions mean and max are not very different, conceptually. Both
return a single number summary of their input, both are only
meaningful for numeric data. Because of implementation differences,
however, the first example returns appropriate values and the second
example dumps. However, when all the variables in your data frame
are numeric, or when you want to use by with a matrix, you should
encounter few difficulties.
INDICES:South
Murder Population Life.Exp
Min. : 6.20 Min. : 579 Min. :67.96
1st Qu.: 9.25 1st Qu.: 2622 1st Qu.:68.98
Median :10.85 Median : 3710 Median :70.07
Mean :10.58 Mean : 4208 Mean :69.71
3rd Qu.:12.27 3rd Qu.: 4944 3rd Qu.:70.33
Max. :15.10 Max. :12240 Max. :71.42
. . .
63
Chapter 3 Data Frames
To compute the mean murder rate by region and income, use tapply
as follows.
> tapply(state.x77[,"Murder"],list(state.region,
+ income.lev),mean)
3098+ thru 3993 3993+ thru 4519
Northeast 4.10000 4.700000
South 10.64444 13.050000
North Central NA 4.800000
West 9.70000 4.933333
4519+ thru 4814 4814+ thru 6315
Northeast 2.85 6.40
South 7.85 9.60
North Central 5.52 5.85
West 6.30 8.40
64
Adding New Classes of Variables to Data Frames
65
Chapter 3 Data Frames
As you add new classes, you can ensure that they are properly
behaved in data frames by defining your own as.data.frame method
for each new class. In most cases, you can use one of the six paradigm
cases, either as is or with slight modifications. For example, the
character method is a straightforward modification of the vector
method:
> as.data.frame.character
function(x, row.names = NULL, optional = F,
na.strings = "NA", ...)
as.data.frame.vector(factor(x,exclude =na.strings),
row.names,optional)
This method converts its input to a factor, then calls the function
as.data.frame.vector.
You can create new methods from scratch, provided they have the
same arguments as as.data.frame.
> as.data.frame
function(x, row.names = NULL, optional = F, ...)
UseMethod("as.data.frame")
The argument “..." allows the generic function to pass any method-
specific arguments to the appropriate method.
If you’ve already built a function to construct data frames from a
certain class of data, you can use it in defining your as.data.frame
method. Your method just needs to account for all the formal
arguments of as.data.frame. For example, suppose you have a class
loops and a function make.df.loops for creating data frames from
objects of that class. You can define a method as.data.frame.loops
as follows.
> as.data.frame.loops
function(x, row.names = NULL, optional = F, ...)
{
x <- make.df.loops(x, ...)
if(!is.null(row.names))
{ row.names <- as.character(row.names)
if(length(row.names) != nrow(x))
stop(paste("Provided", length(row.names),
"names for", nrow(x), "rows"))
attr(x, "row.names") <- row.names
66
Adding New Classes of Variables to Data Frames
}
x
}
67
Chapter 3 Data Frames
> attributes(auto)
$names:
[1] "Price" "Country" "Reliab" "Mileage" "Type"
$row.names:
[1] "AcuraIntegra4" "Audi1005" "BMW325i6"
[4] "ChevLumina4" "FordFestiva4" "Mazda929V6"
[7] "MazdaMX-5Miata" "Nissan300ZXV6" "OldsCalais4"
[10] "ToyotaCressida6"
$class:
[1] "data.frame"
The variable names are stored in the names attribute and the row
names are stored in the rownames attribute. There is also a class
attribute with value data.frame. All data frames have class attribute
data.frame.
> attributes(cu.summary[,"Country"])
$levels:
[1] "Brazil" "England" "France" "Germany"
[5] "Japan" "Japan/USA" "Korea" "Mexico"
[9] "Sweden" "USA"
68
Data Frame Attributes
$class:
[1] "factor"
Attribute Description
69
Chapter 3 Data Frames
70
WRITING FUNCTIONS IN
S-PLUS
Introduction
473
The Structure of Functions 75
Function Names and Operators 75
Arguments 78
The Function Body 78
Return Values and Side Effects 78
Elementary Functions 80
Operations on Complex Numbers 84
Summary Functions 85
Comparison and Logical Operators 86
Assignments 89
Testing and Coercing Data 91
Operating on Subsets of Data 94
Subscripting Vectors 94
Subscripting Matrices and Arrays 98
Subscripting Lists 102
Subscripting Data Frames 105
Organizing Computations 107
Programming Style 107
Flow of Control 108
Notes Regarding Commented Code 120
Specifying Argument Lists 121
Formal and Actual Names 121
Specifying Default Arguments 122
Handling Missing Arguments 122
Lazy Evaluation 123
Variable Numbers of Arguments 124
Required and Optional Arguments 125
71
Chapter 4 Writing Functions in S-PLUS
72
Introduction
INTRODUCTION
Programming in S-PLUS consists largely of writing functions. The
simplest functions arise naturally as shorthand for frequently-used
combinations of S-PLUS expressions.
For example, consider the interquartile range, or IQR, of a data set.
Given a collection of data points, the IQR is the difference between
the upper and lower (or third and first) quartiles of the data. Although
S-PLUS has no built-in function for calculating the IQR, it does have
functions for computing quantiles and differences of numeric vectors.
The following two commands define and test a function that returns
the IQR of a numeric vector.
75%
169.75
73
Chapter 4 Writing Functions in S-PLUS
> Edit(newfunc)
74
The Structure of Functions
Function Most functions are associated with names when they are defined. The
Names and form of the name conveys some important information about the
nature of the function. Most functions have simple, relatively short,
Operators alphanumeric names that begin with a letter, such as plot,
na.exclude, or anova. These functions are always used in the form
function.name(arglist).
> 7 + 5 - 8^2 / 19 * 2
[1] 5.263158
75
Chapter 4 Writing Functions in S-PLUS
Here, the exponentiation is done first, 8^2=64. Division has the same
precedence as multiplication, but appears to the left of the
multiplication in the expression. Therefore, it is performed first:
64/19=3.368421. Next comes the multiplication:
3.368421*2=6.736842. Finally, S-PLUS performs the addition and
subtraction: 7+5-6.736842=5.263158.
You can override the normal precedence of operators by grouping
with parentheses or curly braces:
Operator Use
$ component selection
@ slot selection
[ [[ subscripts, elements
^ exponentiation
- unary minus
: sequence operator
* / multiply, divide
76
The Structure of Functions
Table 4.1: Precedence of operators. Operators listed higher in the table have higher
precedence than those listed below, and operators on the same line have equal
precedence.
Operator Use
! not
~ formulas
Note
When using the ^ operator, the exponent must be an integer if the base is a negative number. If
you require a complex result when the base is negative, be sure to coerce it to mode "complex".
See the section Operations on Complex Numbers (page 84) for more details.
77
Chapter 4 Writing Functions in S-PLUS
Arguments Arguments to a function specify the data to be operated on, and also
pass processing parameters to the function. Not all functions accept
arguments. For example, the date function can only be called with
the syntax date():
> args(date)
function()
> args(lm)
function(formula, data, weights, subset, na.action,
method = "qr", model = F, x = F, y = F, contrasts = NULL,
...)
The Function The body of a function is the part that actually does the work. It
Body consists of a sequence of S-PLUS statements and expressions. If there
is more than one expression, the entire body must be enclosed in
braces. Whether braces should always be included is a matter of
programming style; we recommend including them in all of your
functions because it makes maintenance less accident-prone. By
adding braces when you define a single-line function, you ensure they
won’t be forgotten when you add functionality to it.
Most of this chapter (and, in fact, most of this book) is devoted to
showing you how to write the most effective function body possible.
This involves organizing the computations efficiently and naturally,
expressing them with suitable S-PLUS expressions, and returning the
appropriate information.
78
The Structure of Functions
In this expression, the return value from the function f on the input x
is preserved in the object y for further analysis.
Note
In compiled languages such as C and Fortran, you can pass arguments directly to a function that
modifies the argument values in memory. In S-PLUS however, all arguments are passed by value.
This means that only copies of the arguments are modified throughout the body of a function.
79
Chapter 4 Writing Functions in S-PLUS
Name Operation
80
The Structure of Functions
Name Operation
> sqrt(M)
81
Chapter 4 Writing Functions in S-PLUS
> tan(M)
Note that both sqrt(M) and tan(M) return objects that are the same
shape as M. The element in the ith row and jth column of the matrix
returned by sqrt(M) is the square root of the corresponding element
in M. Likewise, the element in the ith row and the jth column of
tan(M) is the tangent of the corresponding element (assumed to be in
radians).
The trunc function acts like floor for elements greater than 0 and
like ceiling for elements less than 0:
> y <- c(-2.6, 1.5, 9.7, -1.0, 25.7, -4.6, -7.5, -2.7, -0.6,
+ -0.3, 2.8, 2.8)
> y
[1] -2.6 1.5 9.7 -1.0 25.7 -4.6 -7.5 -2.7 -0.6
[10] -0.3 2.8 2.8
> trunc(y)
[1] -2 1 9 -1 25 -4 -7 -2 0 0 2 2
> ceiling(y)
[1] -2 2 10 -1 26 -4 -7 -2 0 0 3 3
> floor(y)
[1] -3 1 9 -1 25 -5 -8 -3 -1 -1 2 2
82
The Structure of Functions
If we call fac1024 with n=12 it works fine, but n=13 causes it to return
NA:
> fac1024(12)
[1] 479001600
> fac1024(13)
[1] NA
> fac1024(13.0)
[1] 6227020800
With the function defined like this, the call fac1024(13) finishes
without overflowing.
83
Chapter 4 Writing Functions in S-PLUS
> (2-3i)*(4+6i)
[1] 26+0i
> (2+3i)^(3+2i)
[1] 4.714144-4.569828i
Warning
Do not leave any space between the real number b and the symbol i when defining complex
numbers. If space is included between b and i, the following syntax error is returned:
Problem: Syntax error: illegal name ("i")
> sqrt(-1)
[1] NA
84
The Structure of Functions
> sqrt(-1+0i)
[1] 6.123032e-017+1i
> Re(x^(1/3))
[1] 0.7211248
> Im(x^(1/3))
[1] 1.249025
> Conj(x^(1/3))
[1] 0.7211248-1.249025i
The Mod and Arg functions return the modulus and argument,
respectively, for the polar representation of a complex number:
85
Chapter 4 Writing Functions in S-PLUS
Name Operation
min, max Return the smallest and largest values of the input arguments.
range Returns a vector of length two containing the minimum and maximum
of all the elements in all the input arguments.
mean, median Return the arithmetic mean and median of the input arguments. The
optional trim argument to mean allows you to discard a specified
fraction of the largest and smallest values.
quantile Returns user-requested sample quantiles for a given data set. For
example,
> quantile(corn.rain, c(0.25, 0.75))
25% 75%
9.425 12.075
summary Returns the minimum, maximum, first and third quartiles, mean, and
median of a numeric vector.
Comparison Table 4.4 lists the S-PLUS operators for comparison and logic.
and Logical Comparisons and logical operations are frequently convenient for
such tasks as extracting subsets of data. In addition, conditionals using
Operators
86
The Structure of Functions
! not
Notice that S-PLUS has two types of logical operators for AND and
OR operations. Table 4.4 refers to the two types as “vectorized” and
“control.” The vectorized operators evaluate AND and OR expressions
element-by-element, returning a logical vector containing TRUE and
FALSE as appropriate. For example:
> x <- c(1.9, 3.0, 4.1, 2.6, 3.6, 2.3, 2.8, 3.2, 6.6,
+ 7.6, 7.4, 1.0)
> x
[1] 1.9 3.0 4.1 2.6 3.6 2.3 2.8 3.2 6.6 7.6 7.4 1.0
87
Chapter 4 Writing Functions in S-PLUS
The control operators have the additional property that they are
evaluated only as far as necessary to return a correct value. For
example, consider the following expression for some numeric vector
y:
The any function evaluates to TRUE if any of the elements in any of its
arguments are true; it returns FALSE if all of the elements are false.
Likewise, the all function evaluates to TRUE if all of the elements in all
of its arguments are true; it returns FALSE if there are any false
elements. S-PLUS initially evaluates only the first condition in the
above expression, any(x > 1). After determining that x > 1 for some
element in x, only then does S-PLUS proceed to evaluate the second
condition, all(y < 0).
Similarly, consider the following command:
S-PLUS stops evaluation with all(x >= 1) and returns TRUE, even
though the statement 2 > 7 is false. Because the first condition is true,
so is the entire expression.
Logical comparisons involving the symbolic constants NA and NULL
always return NA, regardless of the type of operator used. For
example:
> y > 0
[1] T NA T
> is.na(y)
[1] F T F
88
The Structure of Functions
> is.null(names(kyphosis))
[1] F
> is.null(names(letters))
[1] T
For more details on functions such as is.na and is.null, see the
section Testing and Coercing Data (page 91).
89
Chapter 4 Writing Functions in S-PLUS
Warning
In addition to object assignments, the equals sign is used for argument assignments within a
function definition. Because of this, there are some ambiguities that you must be aware of when
using the equals sign as an assignment operator. For example, the command
> print(x <- myfunc(y))
assigns the value from myfunc(y) to the object x and then prints x. Conversely, the command
> print(x = myfunc(y))
simply prints the value of myfunc(y) and does not perform an assignment. This is because the
print function has an argument named x, and argument assignment takes precedence over
object assignment with the equals sign. Because of these ambiguities, we discourage the use of the
equals sign for left assignment.
90
The Structure of Functions
your working data directory are overwritten if they exist. This can
lead to lost data. For this reason, we discourage the use of <<- within
functions.
A more general form of assignment uses the assign function. The
assign function allows you to choose where the assignment takes
place. You can assign an object to either a position in the search list or
a particular frame. For example, the following command assigns the
value 3 to the name boo on the session frame 0:
Testing and Most functions expect input data of a particular type. For example,
Coercing Data mathematical functions expect numeric input while text processing
functions expect character input. Other functions are designed to
work with a wide variety of input data and have internal branches
that use the data type of the input to determine what to do.
Unexpected data types can often cause a function to stop and return
error messages. To protect against this behavior, many functions
include expressions that test whether the input data is of the right type
and coerce the data if necessary. For example, mathematical functions
frequently have conditionals of the following form:
This statement tests the input data x with the is function. If x is not
numeric, it is coerced to a numeric object with the as function.
As we discuss in Chapter 1, The S-PLUS Language, older versions of
S-PLUS (S-PLUS 3.x, 4.x, and 2000) were based on version 3 of the S
language (SV3). Most testing of SV3 objects is done with functions
having names of the form is.type, where type is a recognized data
type. For example, the functions is.vector and is.matrix test
whether the data type of an object is a vector and a matrix,
respectively. Functions also exist to test for special values such as NULL
and NA; see the section Comparison and Logical Operators (page 86)
for more information.
91
Chapter 4 Writing Functions in S-PLUS
For a list of atomic modes, see the help file for the mode function.
Newer versions of S-PLUS (S-PLUS 5.x and later) are based on version
4 of the S language (SV4), which implements a vastly different
approach to classes. In SV4, the is.type and as.type functions are
collapsed into the simpler is and as functions. For example, to test
whether an object x is numeric, type:
92
The Structure of Functions
Table 4.5: Common functions for testing and coercing data objects.
93
Chapter 4 Writing Functions in S-PLUS
> x <- c(1.9, 3.0, 4.1, 2.6, 3.6, 2.3, 2.8, 3.2, 6.6,
+ 7.6, 7.4, 1.0)
94
Operating on Subsets of Data
> x
[1] 1.9 3.0 4.1 2.6 3.6 2.3 2.8 3.2 6.6 7.6 7.4 1.0
> x[3]
[1] 4.1
The next command returns the third, fifth, and ninth elements:
> x[c(3,5,9)]
[1] 4.1 3.6 6.6
> x[c(5,5,8)]
[1] 3.6 3.6 3.2
> x[12:1]
[1] 1.0 7.4 7.6 6.6 3.2 2.8 2.3 3.6 2.6 4.1 3.0 1.9
> x[-(3:5)]
[1] 1.9 3.0 2.3 2.8 3.2 6.6 7.6 7.4 1.0
> x[-13]
[1] 1.9 3.0 4.1 2.6 3.6 2.3 2.8 3.2 6.6 7.6 7.4 1.0
95
Chapter 4 Writing Functions in S-PLUS
> x > 2
[1] F T T T T T T T T T T F
The next command returns the elements in x that are between 2 and
4:
Logical index vectors are generally the same length as the vectors to
be subscripted. However, this is not a strict requirement, as S-PLUS
recycles the values in a short logical vector so that its length matches a
longer vector. Thus, you can use the following command to extract
every third element from x:
> x[c(F,F,T)]
[1] 4.1 2.3 6.6 1.0
The index vector c(F,F,T) is repeated four times so that its length
matches the length of x. Likewise, the following command extracts
every fifth element from x:
> x[c(F,F,F,F,T)]
[1] 3.6 7.6
96
Operating on Subsets of Data
In this case, the index vector is repeated three times, and no values
are returned for indices greater than length(x).
> length(state.abb)
[1] 50
> names(state.abb)
NULL
> length(state.name)
[1] 50
> state.name
97
Chapter 4 Writing Functions in S-PLUS
Alaska Hawaii
"AK" "HI"
Subscripting Subscripting data sets that are matrices or arrays is very similar to
Matrices and subscripting vectors. In fact, you can subscript them exactly like
vectors if you keep in mind that arrays are stored in column-major
Arrays order. You can think of the data values in an array as being stored in
one long vector that has a dim attribute to specify the array’s shape.
Column-major order states that the data values fill the array so that
the first index changes the fastest and the last index changes the
slowest. For matrices, this means that data values are filled in column-
by-column.
For example, suppose we have the following matrix M:
> M[8]
[1] 2
This corresponds to the element in the second row and third column
of M. When a matrix is subscripted in this way, the element returned is
a single number without dimension attributes. Thus, S-PLUS does not
recognize it as matrix.
S-PLUS also lets you use the structure of arrays to your advantage by
allowing you to specify one subscript for each dimension. Since
matrices have two dimensions, you can specify two subscripts inside
the square brackets. The matrix subscripts correspond to the row and
column indices, respectively:
> M[2,3]
[1] 2
98
Operating on Subsets of Data
[,1] [,2]
[1,] 15 10
[2,] 14 19
The next command returns values from the same two columns,
including all rows except the first:
[,1] [,2]
[1,] 9 7
[2,] 14 19
The next example illustrates how you can use a logical vector to
subscript a matrix or array. We use the built-in data matrix state.x77,
which contains demographic information on all fifty states in the
USA. The third column of the matrix, Illiteracy, gives the percent
of the population in a given state that was illiterate at the time of the
1970 census. We first copy this column into an object named illit:
> dim(state.x77)
[1] 50 8
99
Chapter 4 Writing Functions in S-PLUS
> dimnames(state.x77)
[[1]]:
[1] "Alabama" "Alaska" "Arizona"
[4] "Arkansas" "California" "Colorado"
[7] "Connecticut" "Delaware" "Florida"
[10] . . .
[[2]]:
[1] "Population" "Income" "Illiteracy" "Life.Exp"
[5] "Murder" "HS.Grad" "Frost" "Area"
100
Operating on Subsets of Data
> M[1,3]
[1] 6
[,1]
[1,] 6
> dim(K)
[1] 1 1
[,1] [,2]
[1,] 1 2
[2,] 3 3
> M[subscr.mat]
[1] 15 11
101
Chapter 4 Writing Functions in S-PLUS
Subscripting Lists are vectors of class "list" that can hold arbitrary S-PLUS objects
Lists as individual elements. For example:
> mode(mylist[1])
[1] "list"
> mylist[[1]]
[1] "Tom" "Dick" "Harry"
> mode(mylist[[1]])
[1] "character"
102
Operating on Subsets of Data
> biglist
$lista:
$lista$list1:
$lista$list1$x:
[1] 1 2 3 4 5 6 7 8 9 10
$lista$list1$y:
[1] 10 11 12 13 14 15 16 17 18 19 20
$lista$list2:
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m"
[14] "n" "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "y" "z"
$listb:
$listb[[1]]:
[1] "a"
$listb[[2]]:
[1] "r"
$listb[[3]]:
[1] "e"
> biglist[[1]][[1]][[2]]
[1] 10 11 12 13 14 15 16 17 18 19 20
103
Chapter 4 Writing Functions in S-PLUS
> biglist[[c(1,1,2)]]
[1] 10 11 12 13 14 15 16 17 18 19 20
If the elements of a list are named, the named elements are called
components and can be extracted by either the list subscript operator or
the component operator $. For example:
> mylist$x
[1] "Tom" "Dick" "Harry"
> mode(mylist$x)
[1] "character"
You can extract components of embedded lists with nested use of the
component operator:
> biglist$lista$list2
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m"
[14] "n" "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "y" "z"
You can also supply a vector of component names to the list subscript
operator. The effect is the same as supplying a vector of component
numbers, as in the biglist[[c(1,1,2)]] command above. For
example, the following extracts the list2 component of lista in
biglist:
> biglist[[c("lista","list2")]]
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m"
[14] "n" "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "y" "z"
> biglist[["lista"]]$list1
$x:
[1] 1 2 3 4 5 6 7 8 9 10
$y:
[1] 10 11 12 13 14 15 16 17 18 19 20
104
Operating on Subsets of Data
Subscripting Data frames share characteristics of both matrices and lists. Thus,
Data Frames subscripting data frames shares characteristics of subscripting both
matrices and lists. In the examples below, we illustrate the possible
ways that you can use to subscript data frames.
First, we form a data frame from numerous built-in data sets that
contain information on the 50 states in the USA:
Like lists, data frames also have components that you can access with
the component operator. Data frame components are the named
columns and can be accessed just like list components. For example,
the following command returns the Population column of
state.data:
> state.data$Population
105
Chapter 4 Writing Functions in S-PLUS
Population Area
Alabama 3615 50708
Alaska 365 566432
Arizona 2212 113417
Arkansas 2110 51945
California 21198 156361
Colorado 2541 103766
Connecticut 3100 4862
. . .
106
Organizing Computations
ORGANIZING COMPUTATIONS
As with any programming task, the key to successful S-PLUS
programming is to organize your computations before you start.
Break the problem into pieces and use the appropriate tools to
complete each piece. Be sure to take advantage of existing functions
rather than writing new code to perform routine tasks.
S-PLUS programming in particular requires one additional bit of
wisdom that is crucial: treat every object as a whole. Treating objects as
whole entities is the basis for vectorized computation. You should
avoid operating on individual observations, as such computations in
S-PLUS carry a high premium in both memory use and processing
time. Operating on whole objects is made simpler by a very flexible
subscripting capability, as we discuss in the previous section. In most
cases where for loops (or other loop constructs) seem the most
natural way to access individual data elements, you will gain
significantly in performance by using some form of subscripting.
In this section, we provide some high-level suggestions for good
S-PLUS programming style. In addition, we discuss common control
functions such as if, ifelse, and return.
107
Chapter 4 Writing Functions in S-PLUS
108
Organizing Computations
Construction Description
break Terminates the current loop and passes control out of the loop.
109
Chapter 4 Writing Functions in S-PLUS
Table 4.6: S-PLUS constructions that allow you to override the normal flow of control.
Construction Description
repeat {expression} Simpler version of the while statement. No tests are performed
and expression is evaluated indefinitely. Because repeat
statements have no natural termination, they should contain
break, return and/or stop statements.
The if and stop The if statement is the most common branching construction in
Statements S-PLUS. The syntax is simple:
if(condition) { expression }
if (!is(x,"numeric"))
stop("Data must be of mode numeric")
110
Organizing Computations
The stop function stops evaluation of the calling function at the point
where stop occurs. It takes a single argument that should evaluate to
a character string. If such an argument is supplied, the string is printed
to the screen as the text of an error message. For example, under
normal error handling, the above example yields the following output
if x is not numeric:
111
Chapter 4 Writing Functions in S-PLUS
Note
S-PLUS recognizes NA as a logical value, giving three possibilities for logical data: TRUE, FALSE, and
NA. If an if statement encounters NA, the calling function terminates and returns a message of the
following form:
Multiple Cases: One of the most common uses of the if statement is to provide
The if and switch branching for multiple cases. S-PLUS has no formal “case” statement,
Statements so you often implement cases using the following general form:
if(case1) { expression1 }
else if(case2) { expression2 }
else if(case3) { expression3 }
. . .
else lastexpression
We must use the escape character \ in the stop message so that the
double quotes are recognized.
112
Organizing Computations
The ifelse The ifelse statement is a vectorized version of the if statement. The
Statement syntax is:
113
Chapter 4 Writing Functions in S-PLUS
Not only is the version using ifelse much quicker, but it also handles
missing values:
[1] 1 1 NA -1
> ifelse
114
Organizing Computations
if(length(na))
test[na] <- T
answer[!test] <- rep(no, length.out = n)[!test]
answer
}
Warning
Note from the code above that ifelse subscripts using single numeric indices. Thus, it designed
to work primarily with vectors and, as an extension, matrices. If you subscript a data frame with
a single index, S-PLUS treats the data frame as a list and returns an entire column; for this reason,
you should exercise care when using ifelse with data frames. For details on subscripting, see the
section Operating on Subsets of Data (page 94).
For more hints on replacing for loops, see Chapter 21, Using Less
Time and Memory.
The break, next It is often either necessary or prudent to leave a loop before it reaches
and return its natural end. This is imperative in the case of a repeat statement,
Statements which has no natural end. In S-PLUS, you exit loops using one of three
statements: break, next, and return. Of these, return exits not only
115
Chapter 4 Writing Functions in S-PLUS
from the current loop, but also from the current function. The break
and next statements allow you to exit from loops in the following
ways:
• The break statement tells S-PLUS to exit from the current loop
and continue processing with the first expression following
the loop.
• The next statement tells S-PLUS to exit from the current
iteration of the loop and continue processing with the next
iteration.
For example, the function below simulates drawing a card from a
standard deck of 52 cards. If the card is not an ace, it is replaced and
another card is drawn. If the card is an ace, its suit is noted, it is
replaced, and another card is drawn. The process continues until all
four aces are drawn, at which time the function returns a statement of
how many draws it took to return all the aces.
The repeat The repeat statement is the simplest looping construction in S-PLUS.
Statement It performs no tests, but simply repeats a given expression
indefinitely. Because of this, the repeated expression should include a
way out, typically using either a break or return statement. The
syntax for repeat is:
116
Organizing Computations
repeat { expression }
For example, the function below uses Newton’s method to find the
positive, real jth roots of a number. A test for convergence is included
inside the loop and a break statement is used to exit from the loop.
> newton(4:9)
[1] 2.000000 2.236068 2.449490 2.645751 2.828427 3.000000
To condense the code, we can replace the break statement inside the
loop with a return statement. This makes it clear what the returned
value is and avoids the need for any statements outside the loop:
117
Chapter 4 Writing Functions in S-PLUS
Note
The newton function is vectorized, as most S-PLUS functions should be. Thus, the convergence
criteria given above is not ideal for Newton’s method, since it does not check the convergence of
individual values. The code is provided here to illustrate the repeat and break statements; if you
wish to use the code in your work, you may want to experiment with different convergence
conditions.
The while You use the while statement to loop over an expression until a true
Statement condition becomes false. The syntax is simple:
while(condition) { expression }
118
Organizing Computations
> bitstring(13)
[1] 1 1 0 1
The for Using for loops is a traditional programming technique that is fully
Statement supported in S-PLUS. Thus, you can translate most Fortran-like DO
loops directly into S-PLUS for loops and expect them to work.
However, as we have stated, using for loops in S-PLUS is usually not a
good technique because loops do not treat data objects as whole
objects. Instead, they attack the individual elements of data objects,
which is often a less efficient approach in S-PLUS. You should always
be suspicious of lines in S-PLUS functions that have the following
form:
The index variable (i in the above example) has scope only within
the body of the for loop.
Note that there are certain situations in which for loops may be
necessary in S-PLUS:
• when the calculation on the i+1st element in a vector or array
depends on the result of the same calculation on the ith
element.
119
Chapter 4 Writing Functions in S-PLUS
120
Specifying Argument Lists
Formal and When you define an S-PLUS function, you specify the arguments the
Actual Names function accepts by means of formal names. Formal names can be any
combination of letters, numbers, and periods, as long as they are
syntactically valid and do not begin with a number. The formal name
... (three dots) is used to pass arbitrary arguments to a function; we
discuss this in the section Variable Numbers of Arguments (page 124).
For example, consider the argument list of the hist function:
> args(hist)
The formal names for this argument list are x, nclass, breaks, plot,
probability, include.lowest, ..., and xlab.
When you call a function, you specify actual names for each argument.
Unlike formal names, an actual name can be any valid S-PLUS
expression that makes sense to the function. You can thus provide a
function call such as length(x) as an argument. For example, suppose
we want to create a histogram of the Mileage column in the
fuel.frame data set:
> hist(fuel.frame$Mileage)
121
Chapter 4 Writing Functions in S-PLUS
Specifying In general, there are two ways to specify default values for arguments
Default in an S-PLUS function:
Arguments • The simplest way is to use the structure formalname=value
when defining a formal argument. For example, consider
again the argument list for the hist function.
> args(hist)
function(x, nclass = "Sturges", breaks, plot = TRUE,
probability = FALSE, include.lowest = T, ...,
xlab = deparse(substitute(x)))
if(missing(breaks)) {
if(is.character(nclass))
nclass <- switch(casefold(nclass),
sturges = nclass.sturges(x),
fd = nclass.fd(x),
scott = nclass.scott(x),
stop("Nclass method not recognized"))
else if(is.function(nclass)) nclass <- nclass(x)
breaks <- pretty(x, nclass)
if(length(breaks) == 1) {
if(abs(breaks) < .Machine$single.xmin * 100)
breaks <- c(-1, -0.5, 0.5, 1)
else if(breaks < 0)
breaks <- breaks * c(1.3, 1.1, 0.9, 0.7)
else
breaks <- breaks * c(0.7, 0.9, 1.1, 1.3)
}
122
Specifying Argument Lists
S-PLUS doesn’t need the value for y until the final expression, at
which time it can be successfully evaluated. In many programming
languages, such a function definition causes errors similar to
Undefined variable sqrt(z1). In S-PLUS, however, arguments
aren’t evaluated until the function body requires them.
123
Chapter 4 Writing Functions in S-PLUS
if(plot)
invisible(barplot(counts, width = breaks,
histo = T, ..., xlab = xlab))
The counts, breaks, and xlab objects are generated in the hist code
and passed to the formal arguments in barplot. In addition, anything
the user specifies that is not an element of the hist argument list is
given to barplot through the ... argument.
In general, arbitrary arguments can be passed to any function. You
can, for example, create a function that computes the mean of an
arbitrary number of data sets using the mean and c functions as
follows:
As a variation, you can use the list function to loop over arguments
and compute the individual means of an arbitrary number of data
sets:
124
Specifying Argument Lists
Required and Required arguments are those for which a function definition provides
Optional neither a default value nor missing-argument instructions. All other
arguments are optional. For example, consider again the argument list
Arguments for hist:
> args(hist)
125
Chapter 4 Writing Functions in S-PLUS
xlab = deparse(substitute(x)))
126
Error Handling
ERROR HANDLING
An often neglected aspect of function writing is error-handling, in
which you specify what to do if something goes wrong. When writing
quick functions for your own use, it doesn’t make sense to invest
much time in “bullet-proofing” your functions: that is, in testing the
data for suitability at each stage of the calculation and providing
informative error messages and graceful exits from the function if the
data proves unsuitable. However, good error handling becomes
crucial when you broaden the intended audience of your function.
In the section Flow of Control (page 108), we saw one mechanism in
stop for implementing graceful exits from functions. The stop
function immediately stops evaluation of the current function, issues
an error message, and then dumps debugging information to a data
object named last.dump. The last.dump object is a list that can either
be printed directly or reformatted using the traceback function. For
example, here is the error message and debugging information
returned by the my.ran function from page 112:
> traceback()
6: eval(action, sys.parent())
5: doErrorAction("Problem in my.ran(10, distribution =
\"unif\"): distribution must be \"gamma\", \"exp\", or
\"norm\"",
4: stop("distribution must be \"gamma\", \"exp\", or
\"norm\"")
3: my.ran(10, distribution = "unif")
2: eval(expression(my.ran(10, distribution = "unif")))
1:
Message: Problem in my.ran(10, distribution = "unif"):
distribution must be "gamma", "exp", or "norm"
127
Chapter 4 Writing Functions in S-PLUS
> options()$error
expression(dump.calls())
The warning function is similar to stop, but does not cause S-PLUS to
stop evaluation. Instead, S-PLUS continues evaluating after the
warning message is printed to the screen. This is a useful technique
for warning users about potentially hazardous conditions such as data
coercion:
128
Error Handling
if (!is(x, "numeric")) {
warning("Coercing to mode numeric")
x <- as(x, "numeric")
}
129
Chapter 4 Writing Functions in S-PLUS
Data Input Most data input to S-PLUS functions is in the form of named objects
passed as required arguments to the functions. For example:
> mean(corn.rain)
[1] 10.78421
> mean(c(5,9,23,42))
[1] 19.75
> 7 + 3
[1] 10
130
Input and Output
> a <- 7 + 3
> options()$width
[1] 80
> options()$length
[1] 48
> options(digits=17)
> pi
[1] 3.1415926535897931
131
Chapter 4 Writing Functions in S-PLUS
You can also change the digits value through the General Settings
dialog; select Options General Settings and click on the
Computations tab to see this. It is important to note that any option
changed through the GUI persists from session to session. In contrast,
options changed via the options function are restored to their default
values when you restart S-PLUS. For more details, see the help files for
the options function and the Command Line Options dialog.
> format(sqrt(1:10))
> options(digits=3)
> format(sqrt(1:10))
132
Input and Output
[1] "1 " "1.41" "1.73" "2 " "2.24" "2.45" "2.65"
[8] "2.83" "3 " "3.16"
To include trailing zeros, you can use the nsmall argument to format,
which sets the minimum number of digits to include after the decimal
point:
133
Chapter 4 Writing Functions in S-PLUS
Warning
If you want to print numeric values to a certain number of digits, do not use print followed by
round. Instead, use format to convert the values to character vectors and then specify a certain
number of entries. Printing numbers with print involves rounding, and rounding an
already-rounded number can lead to anomalies. To see this, compare the output from the
following two commands, for x <- runif(10):
134
Input and Output
return(x)
}
Notice that the function has no side effects. All calculations are
assigned to objects in the function’s frame, which are then combined
into a list and returned as the value of the function. This is the
preferred method for returning a number of different results in an
S-PLUS function.
Suppose we have data files named april.sales and may.sales
containing daily sales information for April and May, respectively.
The following commands show how monthly.summary can be used to
compare the data:
$"Total Sales":
[1] 55 59 91 87 101 183 116 119 78 166
135
Chapter 4 Writing Functions in S-PLUS
$"Average Sales":
[1] 105.5
attr($"Average Sales", "dev"):
[1] 42.16436
$"Best Store":
[1] 6
> May92
$"Total Sales":
[1] 65 49 71 91 105 163 126 129 81 116
$"Average Sales":
[1] 99.6
attr($"Average Sales", "dev"):
[1] 34.76013
$"Best Store":
[1] 6
Side Effects A side effect of a function is any result that is not part of the returned
value. Examples include graphics plots, printed values, permanent
data objects, and modified session options or graphical parameters.
Not all side effects are bad; graphics functions are written to produce
side effects in the form of plots, while their return values are usually of
no interest. In such cases, you can suppress automatic printing with
the invisible function, which invisibly returns the value of a
function. Most of the printing functions, such as print.atomic, do
exactly this:
136
Input and Output
> print.atomic
You should consciously try to avoid hidden side effects because they
can wreak havoc with your data. Permanent assignment from within
functions is the cause of most bad side effects. Many S-PLUS
programmers are tempted to use permanent assignment because it
allows expressions inside functions to work exactly as they do at the
S-PLUS prompt. The difference is that if you type
at the S-PLUS prompt, you are likely to be aware that myobj is about to
be overwritten if it exists. In contrast, if you call a function that
contains the same expression, you may have no idea that myobj is
about to be destroyed.
Writing to Files In general, writing data to files from within functions can be as
dangerous a practice as permanent assignment. Instead, it is safer to
create special functions that generate output files. Such functions
should include arguments for specifying the output file name and the
format of the included data. The actual writing can be done by a
number of S-PLUS functions, the simplest of which are write,
write.table, cat, sink, and exportData. The write and write.table
functions are useful for retaining the structure of matrices and data
frames, while cat and sink can be used to create free-format data
files. The exportData function creates files in a wide variety of
formats. See Chapter 5, Importing and Exporting, for details.
Functions such as write, cat, and exportData all generate files
containing data; no S-PLUS structure is written to the files. If you wish
to write the actual structure of your S-PLUS data objects to text files,
use the dump, data.dump, or dput functions. We discuss each of these
below.
137
Chapter 4 Writing Functions in S-PLUS
1 2 3 4 5
6 7 8 9 10
11 12
The mat2.txt file looks similar to the object mat, and contains the
following lines:
1 4 7 10
2 5 8 11
3 6 9 12
138
Input and Output
The argument fill=T limits the width of each line in the output file to
the width value specified in the options list. For more details on the
format function and the width option, see the section Formatting
Output (page 131).
To write to a file with cat, simply specify a file name with the file
argument:
> cat(format(x), file="mydata1.txt")
The sink function directs S-PLUS output into a file rather than to the
screen. It can be used as an alternative to multiple
cat(..., append=T) statements. For example, the following
commands open a sink to a file named mydata2.txt, write x to the file
in three different ways, and then close the sink so that S-PLUS writes
future output to the screen:
For more examples using sink, see the section Standard Connections
(page 145).
139
Chapter 4 Writing Functions in S-PLUS
Note
In earlier versions of S-PLUS, the dump function could be used to transfer data objects such as
matrices and lists between machines. This behavior is no longer supported in SV4 versions of
S-PLUS. Currently, dump is used only for creating editable text files of S-PLUS functions; use
data.dump to transfer your data objects between machines. For more details, see the help files for
these two functions.
140
Input and Output
> tmp.df
x y
1 1 0.54033146
2 2 0.27868110
3 3 0.31963785
4 4 0.26984466
5 5 0.75784146
6 6 0.32501004
7 7 0.90018579
8 8 0.04155586
9 9 0.28102661
10 10 0.09519871
x y
1 1 0.54033146
2 2 0.27868110
3 3 0.31963785
141
Chapter 4 Writing Functions in S-PLUS
4 4 0.26984466
5 5 0.75784146
6 6 0.32501004
7 7 0.90018579
8 8 0.04155586
9 9 0.28102661
10 10 0.09519871
> tmp.df
Problem: Object "tmp.df" not found
You must assign the output from dget to access its contents in your
working directory:
Creating You can use cat, write, and dput together with the tempfile function
Temporary Files to create temporary files that have unique names. Such files are
convenient to use for a variety of purposes, including text processing
tools. For example, the built-in ed function creates a temporary file
that holds the object being edited:
> ed
142
Input and Output
The temporary files created with tempfile are ordinary files written
to the directory specified by the S_TMP environment variable.
Customarily, this directory is a temporary storage location that is
wiped clean frequently. To prevent overloading this directory, it is
best if you incorporate file cleanup into your functions that utilize
tempfile. This is discussed in the section Wrap-Up Actions (page
158). For more information on S-PLUS environment variables such as
S_TMP, see Chapter 18, The S-PLUS Command Line and the System
Interface.
143
Chapter 4 Writing Functions in S-PLUS
Connection Table 4.7 lists the connection classes available in S-PLUS. Each of
Classes these classes extend the virtual class "connection".
Table 4.7: Classes of S-PLUS connections.
Connection
Description
Class
All four classes listed in the table are functions that can be used to
(optionally) open the described connections and return S-PLUS
connection objects. Connection objects are one of the primary tools for
managing connections in S-PLUS. For example, the following
command opens a file connection to myfile.dat and assigns the value
to the connection object filecon.
144
Input and Output
The side effect of the call to file opens the connection, so you may
be tempted to think that the returned object is of little interest.
However, consciencious use of connection objects results in cleaner
and more flexible code. For example, you can use these objects to
delay opening particular connections. Each connection class has an
optional argument open that can be used to suppress opening a
connection. With the returned connection object, you can use the
open function to explicitly open the connection when you need it:
145
Chapter 4 Writing Functions in S-PLUS
Standard
Description
Connection
146
Input and Output
Connection By default, file, fifo, and pipe connections are opened for both
Modes reading and writing, appending data to the end of the connection if it
already exists. While this behavior is suitable for most applications,
you may require different modes for certain connections. Example
situations include:
• Opening a file connection as read-only so that it is not
accidentally overwritten.
• Opening a file connection so that any existing data on it is
overwritten, rather than appended to the end of it.
You can change the default mode of most connections through the
mode argument of the open function. For example, the following
commands open a file connection as write-only. If we try to read from
the connection, S-PLUS returns an error:
> scan(filecon)
Problem in scanDefault(file, what, n): "myfile.dat" already
opened for "write only": use reopen() to change it
As the error message suggests, you can use the reopen function to
close the connection and reopen it with a different value for mode.
Note
The mode of a textConnection cannot be changed. By design, text connections are read-only.
147
Chapter 4 Writing Functions in S-PLUS
Instead of explicitly calling open, you can supply the desired mode
string to the open argument of one of the connection classes. Thus, the
following command illustrates a different way of opening a file as
write-only:
Table 4.9 lists the most common mode strings used to open
connections in S-PLUS.
Table 4.9: Common modes for S-PLUS connections.
148
Input and Output
Support The functions listed in the two tables below provide support for
Functions for managing connections in your S-PLUS session: Table 4.10 describes
Connections functions that allow you to see any active connections and Table 4.11
describes functions that prepare connections for reading or writing.
We have already seen the open and close functions in previous
sections. In the text below, we describe each of the remaining support
functions.
Table 4.10: S-PLUS functions for managing active connections.
Table 4.11: Support functions that prepare connections for reading or writing.
149
Chapter 4 Writing Functions in S-PLUS
> showConnections()
> close(getConnection(52))
[1] T
> close(getConnection("mydata2.txt"))
[1] T
• A file connection.
• The argument where, which is a position measured in bytes
from the start of the file.
• The argument rw, which determines whether the "read" or
"write"position is modified.
150
Input and Output
For pipe and fifo connections, data is read in the same order in
which it is written. Thus, there is no concept of a "read" position for
these connections. Likewise, data is always written to the end of pipes
and fifos, so there is also no concept of a "write" position. For
textConnection objects, only "read" positions are defined.
Reading from and Table 4.12 lists the main S-PLUS functions for reading from and
Writing to writing to connections. Wherever possible, we pair functions in the
Connections table so that relationships between the reading and writing functions
are clear. For details on the scan, cat, data.restore, data.dump,
source, dump, dget, and dput functions, see the section Writing to
151
Chapter 4 Writing Functions in S-PLUS
Files (page 137). For details on readRaw and writeRaw, see the section
Raw Data Objects (page 154). For examples using any of these
functions, see the on-line help files.
Table 4.12: S-PLUS functions for reading from and writing to connections. The first column in the table lists
functions for reading; the second column lists the corresponding writing functions (if any).
Reading Writing
Description
Function Function
readLines writeLines Read n lines and return one character vector per line.
Write n lines, consisting of one character vector per line.
Examples of Pipe The examples throughout most of this section deal mainly with file
Connections connections. This is because files are often the easiest of the
connection classes to visualize applications for, while pipes and fifos
152
Input and Output
2 3 5 7 11
13 17 19 23 29
31 37 41 43 47
53 59 61 67 71
73 79 83 89 97
To compress the file and write the results in primes.gz, issue the
following system command:
gzip -c primes.txt > primes.gz
The following commands read the compressed file in S-PLUS:
[1] 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61
[19] 67 71 73 79 83 89 97
8.4,2.8,2.0,4.2,
4.5,0.3,
8.1,7.3,0.4,
6.1,7.2,8.3,0.6,0.7,
153
Chapter 4 Writing Functions in S-PLUS
3.7,
The process that generated the file placed a comma at the end of each
line. If you use the scan function to read this file, S-PLUS includes an
extra NA after each trailing comma. Instead, you can remove the
trailing commas and read the data into S-PLUS as follows:
[1] 8.4 2.8 2.0 4.2 4.5 0.3 8.1 7.3 0.4 6.1 7.2 8.3 0.6
[14] 0.7 3.7
Using Perl, you can replace the tabs and spaces between each pair of
numbers with a single space. You can then read the file into S-PLUS by
specifying a single white space as the delimiter. The following
commands show how to do this:
[1] 4.02 4.00 2.03 1.62 4.67 2.15 2.00 4.83 4.87 2.00
[11] 4.00 4.38 1.83 4.38 4.73 4.00 4.28 5.45 1.77 4.22
Raw Data Raw data objects are structures that consist of undigested bytes of data.
Objects They can be thought of naturally as vectors of byte data. You can
manipulate these objects in S-PLUS with the usual vector functions to
extract subsets, replace subsets, compute lengths and define lengths.
In addition, raw data can be passed as arguments to functions,
included as slots or components in other objects, and assigned to any
database. However, raw data objects are not are not numeric and
cannot be interpreted as ordinary, built-in vectors. S-PLUS provides
no interpretation for the contents of the individual bytes: they don’t
154
Input and Output
have an intrinsic order, NAs are not defined, and coercion to numbers
or integers is not defined. The only comparison operators that make
sense in this setting are equality and inequality, interpreted as
comparing two objects overall.
In S-PLUS, raw data is usually generated in four basic ways:
1. Read the data from a file or other connection using the
functions readMapped or readRaw. Conversely, you can write
raw data to a file or connection using writeRaw.
2. Use character strings that code bytes in either hex or ascii
coding. The character strings can then be given to the
functions rawFromHex and rawFromAscii to generate the raw
data.
3. Allocate space for a raw object and then fill it through a call to
C code via the .C interface.
4. Call an S-PLUS-dependent C routine through the .Call
interface.
See Chapter 15, Interfacing With C and Fortran Code, for details on
.C and .Call interfaces. For details on additional topics not discussed
here, see Chambers (1998).
The primary S-PLUS constructors for raw data are the rawData and
raw functions. The four approaches mentioned above usually arise
more often in practice, however. All raw data objects in S-PLUS have
class "raw", regardless of how they are generated.
Examples
Raw Data on Files The readMapped function reads binary data of numeric or integer
and Connections modes from a file. Typical applications include reading data written
by another system or by a C or Fortran program. The function also
provides a way to share data with other systems, assuming you know
where the systems write data.
155
Chapter 4 Writing Functions in S-PLUS
Examples
The following example writes twenty integers to a raw data file
named x.raw, and then reads the values back in using the readRaw
function.
To ensure the data are read into S-PLUS as integers, set the argument
what to integer() in the call to readRaw:
The next command reads only the first 10 integers into S-PLUS:
156
Input and Output
[1] 5 5 5 5 5 10 10 10 10 10
You can determine the amount of data that is read into S-PLUS in one
of two ways: the length argument to readRaw or the length of the what
argument. If length is given and positive, S-PLUS uses it to define the
size of the resulting S-PLUS object. Otherwise, the length of what (if
positive) defines the size. If length is not given and what has a length
of zero, all of the data on the file or connection is read.
The following example writes twenty double-precision numbers to a
raw data file named y.raw, and then reads the values back in using
readRaw. Note that the values in the vector y must be explicitly
coerced to doubles using the as.double function, so that S-PLUS does
not interpret them as integers.
To ensure the data are read into S-PLUS as double precision numbers,
set the argument what=double() in the call to readRaw:
157
Chapter 4 Writing Functions in S-PLUS
WRAP-UP ACTIONS
The more complicated your function, the more likely it is to complete
with some loose ends dangling. For example, the function may create
temporary files, or alter S-PLUS session options and graphics
parameters. It is good programming style to write functions that run
cleanly without permanently changing the environment. Wrap-up
actions allow you to clean up loose ends in your functions.
The most important wrap-up action is to ensure that a function
returns the appropriate value or generates the desired side effect.
Thus, the final line of a function is often the name of the object to be
returned or an expression that constructs the object. See the section
Constructing Return Values (page 134) for examples.
To restore session options or specify arbitrary wrap-up actions, use the
on.exit function. With on.exit, you ensure the desired actions are
carried out whether or not the function completes successfully. For
example, highly recursive functions often overrun the default limit for
nested expressions. The expressions argument to the options
function governs this and is set to 256 by default. Here is a version of
the factorial function that raises the limit from 256 to 1024 and then
cleans up:
The first line of fac1024 assigns the old session options to the object
old, and then sets expressions=1024. The call to on.exit resets the
old options when the function finishes. The Recall function is used to
make recursive calls in S-PLUS.
Compare fac1024 with a function that uses the default limit on nested
expressions:
158
Wrap-Up Actions
else { n * Recall(n-1) }
}
Here is the response from S-PLUS when each function is called with
n=80.0:
> fac1024(80.0)
[1] 7.156946e+118
> fac256(80.0)
Note
As defined, the fac1024 function must be called with a real argument such as 80.0. If you call it
with an integer such as 80, S-PLUS overflows and returns NA. See the section Integer Arithmetic
(page 83) for a full discussion of this behavior.
To remove temporary files, you can use on.exit together with the
unlink function. For example:
159
Chapter 4 Writing Functions in S-PLUS
If add=F, the new action replaces any pending wrap-up actions. For
example, suppose your function performs a long, iterative
computation and you want to write the last computed value to disk in
case of an error. You can use on.exit to accomplish this as follows:
If we call this function and then interrupt the computation with ESC,
we see that the object intermediate.result is created. If we let the
function complete, it is not:
> fcn.C()
User interrupt requested
Use traceback() to see the call stack
> intermediate.result
[1] 665856
> rm(intermediate.result)
> fcn.C()
[1] 1e+08
160
Wrap-Up Actions
> intermediate.result
Problem: Object "intermediate.result" not found
161
Chapter 4 Writing Functions in S-PLUS
> get("%*%")
function(x, y, ...)
UseMethod("%*%")
162
Writing Special Functions
Once defined, this operator can be used exactly as any other infix
operator:
[,1] [,2]
[1,] 2 1
[2,] 1 1
> x %^% 3
[,1] [,2]
[1,] 13 8
[2,] 8 5
You can also use this operator to find the inverse of a matrix:
> x %^% -1
[,1] [,2]
[1,] 1 -1
[2,] -1 2
Extraction and As we mention in the section Function Names and Operators (page
Replacement 75), S-PLUS handles assignments in which the left side is a function
call differently from those in which the left side is a name. An
Functions expression of the form f(x) <- value is evaluated as the following
assignment:
163
Chapter 4 Writing Functions in S-PLUS
[,1] [,2]
[1,] 1 6
[2,] 2 7
[3,] 3 8
[4,] 4 9
[5,] 5 10
> dim(x)
[1] 5 2
The result from dim states that the matrix x has 5 rows and 2 columns.
The corresponding function "dim<-" replaces the dim attribute with a
user-specified value:
> get("dim<-")
function(x, value)
.Internal("dim<-"(x, value), "S_replace", T, 10)
164
Writing Special Functions
Two things are worth noting about the definition of "doc<-". First, it
returns the complete, modified object and not just the modified
attribute. Second, it performs no assignment; the S-PLUS evaluator
performs the actual assignment. These characteristics are essential for
writing clean replacement functions.
The following commands use the "doc<-" function to add a doc
attribute to the built-in data set geyser. The attribute is then printed
with the doc function:
> doc(geyser)
Because of the newline characters, this is not the most readable form.
However, if we modify the doc function slightly to use cat instead, we
obtain output that is easier to read:
165
Chapter 4 Writing Functions in S-PLUS
You can build extraction functions to extract almost any piece of data
that you are interested in. Such functions typically use other
extraction functions as their starting points. For example, the
following functions use subscripting to find the elements of an input
vector that have even and odd indices:
> evens(1:10)
[1] 2 4 6 8 10
> odds(1:10)
[1] 1 3 5 7 9
166
Writing Special Functions
167
Chapter 4 Writing Functions in S-PLUS
> rownames(state.x77)
[1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L"
[13] "M" "N" "O" "P" "Q" "R" "S" "T" "U" "V" "W" "X"
[25] "Y" "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k"
[37] "l" "m" "n" "o" "p" "q" "r" "s" "t" "u" "v" "w"
[49] "x" "y"
168
References
REFERENCES
Chambers, J.M. (1998). Programming with Data: A Guide to the S
Language. New York: Springer-Verlag.
Venables, W.N. and Ripley, B.D. (2000). S Programming. New York:
Springer-Verlag.
169
Chapter 4 Writing Functions in S-PLUS
170
IMPORTING AND
EXPORTING
171
Chapter 5 Importing and Exporting
Table 5.1: Supported file types for importing and exporting data.
Default
Format Type Extension Notes
Gauss Data File "GAUSS", .dat Automatically reads the related DHT
"GAUSS96" file, if any, as GAUSS 89. If no DHT file
is found, reads the .DAT file as
GAUSS96.
172
Supported File Types for Importing and Exporting
Table 5.1: Supported file types for importing and exporting data. (Continued)
Default
Format Type Extension Notes
ODBC “ODBC” Not applicable For Informix (.ifx), Oracle (.ora), and
Database SYBASE (.syb) databases.
173
Chapter 5 Importing and Exporting
Table 5.1: Supported file types for importing and exporting data. (Continued)
Default
Format Type Extension Notes
SAS Transport "SAS_TPT" .xpt, .tpt Version 6.x. Some special export
File options may need to be specified in
your SAS program. We suggest using
the SAS Xport engine (not PROC
CPORT) to read and write these files.
SPSS Data File "SPSS" .sav OS/2; Windows; HP, IBM, Sun, DEC
UNIX.
174
Supported File Types for Importing and Exporting
Table 5.1: Supported file types for importing and exporting data. (Continued)
Default
Format Type Extension Notes
175
Chapter 5 Importing and Exporting
IMPORTING DATA
Using the The principal tool for importing data is the importData function,
importData which can be invoked from either the S-PLUS prompt or the File
Import Data menu option.
Function
In most cases, all you need to do to import a data file is to call
importData with the name of the file to be imported as the only
argument. As long as the specified file has one of the default
extensions listed in Table 5.1, you need not specify a type nor, in most
cases, any other information.
For example, suppose you have a SAS data file named rain.sd2 in
your start-up folder. You can read this file into S-PLUS using
importData as follows:
If you have trouble reading the data, most likely you just need to
supply additional arguments to importData to specify extra
information required by the data importer to read the data correctly.
Table 5.2 lists the arguments to the importData function.
Table 5.2: Arguments to importData.
Required or
Argument Optional Description
type Optional A character string specifying the file type of the file
to be imported. See the “Type” column of Table 5.1
for a list of possible values.
176
Importing Data
Required or
Argument Optional Description
177
Chapter 5 Importing and Exporting
Required or
Argument Optional Description
pageNumber Optional The page number of the spreadsheet (used only for
spreadsheets).
178
Importing Data
Required or
Argument Optional Description
sortFactorLevels Optional A logical flag. If TRUE, levels for any factors created
from strings are sorted.
valueLabelAsNumber Optional A logical flag. If TRUE, SAS and SPSS variables with
labels are imported as numbers.
readAsTable Optional A logical flag. If TRUE, S-PLUS reads the entire file as
a single table.
179
Chapter 5 Importing and Exporting
Filter Expressions The filter argument to importData allows you to subset the data you
import. By specifying a query, or filter, you gain additional
functionality, such as taking a random sampling of the data. Use the
following examples and explanation of the filter syntax to create your
statement. A blank filter is the default and results in all data being
imported.
Note
The filter argument is ignored if the type argument (or, equivalently, file extension specified in
the file argument) is set to "ASCII" or "FASCII".
Case selection
You select cases by using a case-selection statement in the filter
argument. The case-selection or where statement has the following
form:
Warning
The syntax used in the filter argument to importData and exportData is not standard S-PLUS
syntax, and the expressions described are not standard S-PLUS expressions. Do not use the
syntax described in this section for any purpose other than passing a filter argument to
importData or exportData.
Variable expressions
You can specify a single variable or an expression involving several
variables. All of the usual arithmetic operators (+ - * / ()) are
available for use in variable expressions, as well as the relational
operators listed in Table 5.3.
Operator Description
== Equal to
!= Not equal to
180
Importing Data
Operator Description
& And
| Or
! Not
Examples
Examples of selection conditions given by filter expressions are:
"account = ????22"
"id = 3*"
The first statement will select any accounts that have 2s as the 5th and
6th characters in the string, while the second statement will select
strings of any length that begin with 3.
181
Chapter 5 Importing and Exporting
"state = CA,WA,OR,AZ,NV"
"caseid != 22*,30??,4?00"
Missing variables
You can test to see that any variable is missing by comparing it to the
special internal variable, NA. For example:
The s denotes a string data type, the f denotes a float data type
(actually, numeric), and the asterisk (*) denotes a “skipped” column.
These are the only allowable format types.
If you do not specify the data type of each column, S-PLUS looks at
the first row of data to be read and uses the contents of this row to
determine the data type of each column. A row of data must always
end with a new line.
S-PLUS auto-detects the file delimiter from a preset list that includes
commas, spaces, and tabs. All cells must be separated by the same
delimiter (that is, each file must be comma-separated, space-
182
Importing Data
183
Chapter 5 Importing and Exporting
The numbers denote the column widths, s denotes a string data type,
f denotes a float data type, and the asterisk (*) denotes a “skip.” You
may need to skip characters when you want to avoid importing some
characters in the file. For example, you may want to skip blank
characters or even certain parts of the data.
If you want to import only some of the rows, specify a starting and
ending row.
If each row ends with a new line, S-PLUS treats the newline character
as a single character-wide variable that is to be skipped.
Lotus files
If your Lotus-type worksheet contains numeric data only in a
rectangular block, starting in the first row and column of the
worksheet, then all you need to specify is the file name and file type.
If a row contains names, specify the number of that row in the
colNameRow argument (it does not have to be the first row). You can
select a rectangular subset of your worksheet by specifying starting
and ending columns and rows. Lotus-style column names (for
example, A, AB) can be used to specify the starting and ending
columns.
184
Importing Data
The row specified as the starting row is always read first to determine
the data types of the columns. Therefore, there cannot be any blank
cells in this row. In other rows, blank cells are filled with missing
values.
dBASE files
S-PLUS imports dBASE and dBASE-compatible files. The file name
and file type are often the only things you need specify for dBASE-
type files. Column names and data types are obtained from the
dBASE file. However, you can select a rectangular subset of your data
by specifying starting and ending columns and rows.
You must specify the data source name if you do not specify the user
ID, password, server, and driver attributes. However, all other
attributes are optional. If you do not specify an attribute, that attribute
defaults to the value specified in the relevant DSN tab of the ODBC
Data Source Administrator.
Note
"DSN=Employees;UID=joesmith;PWD=secret;SERVER=hr.db"
185
Chapter 5 Importing and Exporting
Note
ODBC import and export facilities do not support "nchar" or "nvarchar" data types. The
"varchar" type is supported.
You can use the filter argument in the importData function to filter
data, as described on page 180.
186
Importing Data
To export data from S-PLUS via ODBC, use the standard exportData
function with the type=ODBC argument. Four additional parameters
control the call to the ODBC interface:
• data supplies the data frame to be exported;
• file supplies the name of the data source;
• odbcConnection supplies the ODBC connection string;
• odbcTable supplies the name of the table to be created.
For example, this command exports the data frame myDataSet to
Table23 of data source testSQLServer:
exportData(data=”myDataSet”, file=”testSQLServer”,
type=”ODBC”, odbcConnection =
“DSN=testSQLServer;UID=joesmith;PWD=secret; APP=S-
PLUS;WSID=joesComputer;DATABASE=testdba”,
odbcSqlQuery=”Select * from testdba.dbo.Table23”
)
where
odbcConnection is the connection string to the database
odbcSqlQuery is the statement passed to the database
returnData is the flag to return the data (default=F)
The following is an example of adding a record to an existing table:
187
Chapter 5 Importing and Exporting
Other Data While importData is the recommended method for reading data files
Import into S-PLUS, there are several other functions that you can use to read
ASCII data. These functions are commonly used by other functions
Functions in S-PLUS so it is a good idea to familiarize yourself with them.
The scan The scan function, which can read either from standard input or from
Function a file, is commonly used to read data from keyboard input. By default,
scan expects numeric data separated by white space, although there
are options that let you specify the type of data being read and the
separator. When using scan to read data files, it is helpful to think of
each line of the data file as a record, or case, with individual
observations as fields. For example, the following expression creates a
matrix named x from a data file specified by the user:
Here the data file is assumed to have 10 columns of numeric data; the
matrix contains a number of observations for each of these ten
variables. To read in a file of character data, use scan with the what
argument:
Any character vector can be used in place of "". For most efficient
memory allocation, what should be the same size as the object to be
read in. For example, to read in a character vector of length 1000, use
> scan(what=character(1000))
The what argument to scan can also be used to read in data files of
mixed type, for example, a file containing both numeric and
character data, as in the following sample file, table.dat:
Tom 93 37
Joe 47 42
Dave 18 43
In this case, you provide a list as the value for what, with each list
component corresponding to a particular field:
188
Importing Data
[[2]]:
[1] 93 47 18
[[3]]:
[1] 37 42 43
S-PLUS creates a list with separate components for each field specified
in the what list. You can turn this into a matrix, with the subject names
as column names, as follows:
You can scan files containing multiple line records by using the
argument multi.line=T. For example, suppose you have a file
heart.all containing information in the following form:
johns 1
450 54.6
marks 1 760 73.5
. . .
> scan(’heart.all’,what=list("",0,0,0),multi.line=T)
[[1]]:
[1] "johns" "marks" "avery" "able" "simpson"
. . .
[[4]]:
[1] 54.6 73.5 50.3 44.6 58.1 61.3 75.3 41.1 51.5 41.7 59.7
[12] 40.8 67.4 53.3 62.2 65.5 47.5 51.2 74.9 59.0 40.5
If your data file is in fixed format, with fixed-width fields, you can use
scan to read it in using the widths argument. For example, suppose
you have a data file dfile with the following contents:
189
Chapter 5 Importing and Exporting
01giraffe.9346H01-04
88donkey .1220M00-15
77ant L04-04
20gerbil .1220L01-12
22swallow.2333L01-03
12lemming L01-23
You can now read the data in dfile into S-PLUS calling scan as
follows:
The read.table Data frames in S-PLUS were designed to resemble tables. They must
Function have a rectangular arrangement of values and typically have row and
column labels. Data frames arise frequently in designed experiments
and other situations. If you have a text file with data arranged in the
form of a table, you can read it into S-PLUS using the read.table
function. For example, consider a data file named auto.dat that
contains the records listed below.
190
Importing Data
All fields are separated by spaces, and the first line is a header line.To
create a data frame from this data file, use read.table as follows:
As with scan, you can use read.table within functions to hide the
mechanics of S-PLUS from the users of your functions.
191
Chapter 5 Importing and Exporting
EXPORTING DATA
Using the You use the exportData function to export S-PLUS data objects to
exportData formats for applications other than S-PLUS. (To export data for use by
S-PLUS, use the data.dump function—see page 194.) You can invoke
Function exportData from either the S-PLUS prompt or the File Export
Data menu option.
When exporting to most file types with exportData, you typically
need to specify only the data set, file name, and (depending on the file
name you specified) the file type, and the data will be exported into a
new data file using default settings. For greater control, you can
specify your own settings by using additional arguments to
exportData. Table 5.4 lists the arguments to the exportData function.
Required or
Argument Optional Description
192
Exporting Data
Required or
Argument Optional Description
rowNames Optional A logical flag. If TRUE, row names are also exported.
193
Chapter 5 Importing and Exporting
Other Data In addition to the exportData function, S-PLUS provides several other
Export functions for exporting data, discussed below.
Functions
The data.dump When you want to share your data with another S-PLUS user, you can
Function export your data to an S-PLUS file format by using the data.dump
function:
> data.dump("matz")
Hint
The connection argument needn’t specify a file; it can specify any valid S-PLUS connection
object.
If the data object you want to share is not in your working data, you
must specify the object’s location in the search path with the where
argument:
The cat and The inverse operation to the scan function is provided by the cat and
write Functions write functions. The result of either cat or write is just an ASCII file
with data in it; there is no S-PLUS structure written to the file. Of the
two commands, write has an argument for specifying the number of
columns and thus is more useful for retaining the format of a matrix.
The cat function is a general-purpose writing tool in S-PLUS, used for
writing to the screen as well as writing to files. It can be useful in
creating free-format data files for use with other software, particularly
when used with the format function:
194
Exporting Data
The argument fill=T limits line length in the output file to the width
specified in your options object. To use cat to write to a file, simply
specify a file name with the file argument:
Note
The files written by cat and write do not contain S-PLUS structure information. To read them
back into S-PLUS, you must reconstruct this information.
> mat
[,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
> write(t(mat), "mat", ncol=4)
You can view the resulting file with a text editor; it contains the
following three lines:
1 4 7 10
2 5 8 11
3 6 9 12
195
Chapter 5 Importing and Exporting
EXPORTING GRAPHS
The export.graph function is used to export a graph named Name to
the file FileName using the file format specified by ExportType. Table
5.5 lists the arguments to the export.graph function.
Table 5.5: Arguments to export.graph.
Required or
Argument Optional Description
ColorBits Optional An integer value that specifies the color bits value
used when saving an image. For a complete
discussion of this argument, see page 199.
196
Exporting Graphs
Required or
Argument Optional Description
Specifying the Some of the most common values for the ExportType argument
ExportType include "BMP", "WMF", "EPS", "EPS TIFF", "TIF", "GIF", "JPG", "PNG",
"IMG", "EXIF", "PCT", "TGA", and "WPG". If this argument is not
Argument specified, the file type is inferred from the extension used in the
FileName argument.
Table 5.6 describes the map between file extensions and file types. If
FileName does not include an extension from Table 5.6, one is added
based on the value of this argument. To export a graph to a file that
does not have an extension, specify the appropriate ExportType
format and end the FileName character string with a period.
Table 5.6: Map between file extensions and file types for the ExportType argument.
197
Chapter 5 Importing and Exporting
Table 5.6: Map between file extensions and file types for the ExportType argument. (Continued)
.jpg JPG JPEG File Interchange Format with YUV 4:4:4 color
space
198
Exporting Graphs
Table 5.6: Map between file extensions and file types for the ExportType argument. (Continued)
Specifying the The QFactor argument is a number that determines the degree of loss
QFactor in the compression process when saving an image file to the following
ExportType formats: "CMP", "JPG", "JPG YUV4", "JPG YUV2", "JPG
Argument YUV1", "TIF JPG", "TIF JPG YUV4", "TIF JPG YUV2", "TIF JPG YUV1",
and "EXIF JPG". The valid range is from 2 to 255, with 2 resulting in
perfect quality and 255 resulting in maximum compression. The
default value is 2.
Note
The effect of this argument is identical to the “quality” parameter (0-100%) used in most
applications that view and convert JPEG graphics.
Specifying the Valid options for each format are listed in Table 5.7. The default is to
ColorBits use the maximum value supported by the requested format. This
argument is ignored for the following ExportType formats: "EMF",
Argument "EPS", "EPS TIFF", "EPS WMF", and "WMF".
199
Chapter 5 Importing and Exporting
Compressed TIFF
"TIF JPG" or "TIF JPG YUV4" Tagged Image File with JPEG compression 8, 24
and YUV 4:4:4 color space
"TIF JPG YUV2" Tagged Image File with JPEG compression 8, 24
and YUV 4:2:2 color space
"TIF JPG YUV1" Tagged Image File with JPEG compression 8, 24
and YUV 4:1:1 color space
"TIF PACK" Tagged Image File with PackBits 1, 2, 3, 4, 5, 6, 7, 8,
compression and RGB color space 16, 24, 32
"TIF PACK CMYK" Tagged Image File with PackBits 24, 32
compression and CMYK color space
"TIF PACK YCC" Tagged Image File with PackBits 24
compression and YCbCr color space
"CCITT" TIFF, compressed using CCITT
"CCITT G3 1D" TIFF, compressed using CCITT, group 3, 1
dimension
"CCITT G3 2D" TIFF, compressed using CCITT, group 3, 2
dimensions
"CCITT G4" TIFF, compressed using CCITT, group 4
BMP Formats
"BMP Windows BMP, with no compression 1, 4, 8, 16, 24, 32
"BMP RLE" Windows BMP, with RLE compression 4, 8
"OS2" OS/2 BMP version 1.x 1, 4, 8, 24
"OS2 2" OS/2 BMP version 2.x 1, 4, 8, 24
Exif Formats
"EXIF" Exif file containing a TIFF image, no 24
compression with RGB color space
"EXIF YCC" Exif file containing a TIFF image, no 24
compression with YCbCr color space
"EXIF JPG" Exif file containing a JPEG compressed 24
image
"EXIF 411" Exif 2.0 file containing a JPEG 24
compressed image
200
Exporting Graphs
201
Chapter 5 Importing and Exporting
202
Creating HTML Output
Text The sink function may be used to direct S-PLUS text output to an
HTML file. The preformatted output may be interspersed with the
HTML markup tag <PRE> to denote that it is preformatted output.
Additional textual description and HTML markup tags may be
interspersed with the S-PLUS output using cat.
> sink("my.htm")
> cat("<H3> Linear Model Results </H3> \n")
> cat("<PRE>")
> summary(lm(Mileage~Weight, fuel.frame))
> cat("</PRE>")
> sink()
The paste and deparse functions are useful for constructing strings to
display with cat. See their help files for details.
203
Chapter 5 Importing and Exporting
204
DEBUGGING YOUR
FUNCTIONS
Introduction
6
206
Basic S-PLUS Debugging 207
Printing Intermediate Results 208
Using recover 210
Interactive Debugging 212
Starting the Inspector 213
Examining Variables 214
Controlling Evaluation 218
Entering, Marking, and Tracking Functions 220
Entering Functions 221
Marking Functions 221
Marking the Current Expression 222
Viewing and Removing Marks 223
Tracking Functions 224
Modifying the Evaluation Frame 226
Error Actions in the Inspector 228
Other Debugging Tools 231
Using the S-PLUS Browser Function 231
Using the S-PLUS Debugger 232
Tracing Function Evaluation 233
205
Chapter 6 Debugging Your Functions
INTRODUCTION
Debugging your functions generally takes much longer than writing
them because relatively few functions work exactly as you want them
to the first time you use them. You can (and should) design large
functions before writing a line of code, but because of the interactive
nature of S-PLUS, it is often more efficient to simply type in a smaller
function, then test it and see what improvements it might need.
S-PLUS provides several built-in tools for debugging your functions.
In general, these tools make use of the techniques described in
Chapter 4, Writing Functions in S-PLUS, to provide you with as much
information as possible about the state of the evaluation.
In this chapter, we describe several techniques for debugging S-PLUS
functions using these built-in tools as well as the techniques of
Chapter 19, Computing on the Language, to extend these tools even
further. For a discussion of debugging loaded code, see Chapter 15,
Interfacing With C and Fortran Code. Refer also to Chapter 20, Data
Management, for a detailed discussion of frames.
206
Basic S-PLUS Debugging
> acf(corn.rain,type="normal")
Problem in switch(itype + 1,: desired type of ACF is
unknown
Use traceback() to see the call stack
Dumped
207
Chapter 6 Debugging Your Functions
> traceback()
6: eval(action, sys.parent())
5: doErrorAction("Problem in switch(itype + 1,: desired
type of ACF is unknown",
4: stop("desired type of ACF is unknown")
3: acf(corn.rain, type = "normal")
2: eval(expression(acf(corn.rain, type = "normal")))
1:
Message: Problem in switch(itype + 1,: desired type of
ACF is unknown
Printing One of the oldest techniques for debugging, and still widely used, is to
Intermediate print intermediate results of computations directly to the screen. By
examining intermediate results in this way, you can see if correct
Results values are used as arguments to functions called within the top-level
function.
This can be particularly useful when, for example, you are using
paste to construct a set of elements. Suppose that you have written a
function to make some data sets, with names of the form datan, where
each data set contains some random numbers:
make.data.sets <-
function(n) {
names <- paste("data", 1:n)
for (i in 1:n)
{
208
Basic S-PLUS Debugging
> make.data.sets(5)
S-PLUS reports no errors, so you look for your newly created data set,
data4:
> data4
Error: Object "data4" not found
To find out what names the function actually was creating, put a cat
statement into make.data.sets after assigning names:
> make.data.sets
function(n)
{
names <- paste("data", 1:n)
cat(names, "\n ")
for(i in 1:n)
{ assign(names[i], runif(100), where = 1)
}
}
> make.data.sets(5)
data 1 data 2 data 3 data 4 data 5
The cat function prints the output in the simplest form possible; you
can get more usual-looking S-PLUS output by using print or show
instead (the show function was introduced in S-PLUS 5.0 as a more
object-oriented version of print):
> make.data.sets
function(n)
{
names <- paste("data", 1:n)
print(names)
for(i in 1:n)
{ assign(names[i], runif(100), where = 1)
}
}
> make.data.sets(5)
[1] "data 1" "data 2" "data 3" "data 4" "data 5"
209
Chapter 6 Debugging Your Functions
> make.data.sets
function(n)
{ names <- paste("data", 1:n, sep = "")
print(names)
for(i in 1:n)
{ assign(names[i], runif(100), where = 1)
}
}
> make.data.sets(5)
"data1" "data2" "data3" "data4" "data5"
> data4
[1] 0.784289481 0.138882026 0.656852996 0.443559750
[5] 0.651548887 . . .
Using recover The recover function can be used to provide interactive debugging as
an error action. To use recover, set your error action as follows:
options(error=expression(if(interactive())
recover() else dump.calls()))
Then, for those type of errors which would normally result in the
message “Problem in ... Dumped,” you are instead asked “Debug? Y/
N”; if you answer “Y”, you are put into recover’s interactive debugger,
with a R> prompt. Type ? at the R> prompt to see the available
commands. Use up to move up the frame list, down to move down the
list. As you move to each frame, recover provides you with a list of
local variables. Just type the local variable name to see its current
value. For example, here is a brief session that follows a faulty call to
the sqrt function:
> sqrt(exp)
210
Basic S-PLUS Debugging
Debug ? ( y|n ): y
Browsing in frame of x^0.5
Local Variables: .Generic, .Signature, e1, e2
R> ?
Type any expression. Special commands:
`up', `down' for navigation between frames.
`where' # where are we in the function calls?
`dump' # dump frames, end this task
`q' # end this task, no dump
`go' # retry the expression, with corrections made
Browsing in frame of x^0.5
Local Variables: .Generic, .Signature, e1, e2
R> up
Browsing in frame of sqrt(exp)
Local Variables: x
R(sqrt)> x
function(x)
.Internal(exp(x), "do_math", T, 108)
R(sqrt)> x<-exp(1)
R(sqrt)> go
[1] 1.648721
211
Chapter 6 Debugging Your Functions
INTERACTIVE DEBUGGING
Although print, show, and cat statements can help you find many
bugs, they aren’t a particularly efficient way to debug functions,
because you need to make your modifications in a text editor, run the
function, examine the output, then return to the text editor to make
further modifications. If you are examining a large number of
assignments, the simple act of adding the print statements can
become wearisome.
Using recover provides interactive debugging, but it has no real
debugging facilities—the ability to step through code a line at a time,
set breakpoints, track functions, and so on.
With the interactive debugging function inspect you can follow the
evaluation of your function as closely as you want, from stepping
through the evaluation expression-by-expression to running the
function to completion, and almost any level of detail in between.
While inspecting you can do any of the following tasks:
• examine variables in the function’s evaluation frame. Thus,
print and cat statements are unnecessary. You can also look
at function definitions.
• track functions called by the current function. You can request
that a message be printed on entry or exit, and that your own
expressions be installed at those locations.
• mark the current expression. If the marked expression occurs
again during the inspection session, evaluation halts at that
point. Functions can be marked as well; evaluation will halt at
the top of a marked function whenever it is called. Marking an
expression or function corresponds to setting a breakpoint.
• enter a function; this allows you to step through a single
function call, without stopping in subsequent calls to the same
function.
• examine the current expression, together with the current
calling stack. The calling stack lets you know how deeply
nested the current expression is, and how you got there.
212
Interactive Debugging
Starting the To start a session with the inspector, call inspect with a specific
Inspector function call as an argument. For example, the call to make.data.sets
with n=5 resulted in a problem, so we can try to track it down by
starting inspect as follows:
> inspect(make.data.sets(5))
entering function make.data.sets
stopped in make.data.sets (frame 3), at:
names <- paste("data", 1:n)
d>
213
Chapter 6 Debugging Your Functions
Examining You can obtain a listing of objects in the current evaluation frame with
Variables the inspector instruction objects. For example, in our call to
make.data.frames, we obtain the following listing from objects:
d> objects
[1] ".Auto.print" ".entered." ".name." "n"
d> eval n
[1] 5
make.data.sets
function(n)
{ names <- paste("data", 1:n)
{ for(i in 1:n)
{ assign(names[i], runif(100), where = 1 )
}
}
}
When you use eval or fundef to look at S-PLUS objects, you can in
general just type the name of the object after the instruction, as in the
examples above. Names in S-PLUS that correspond to the inspect
function’s keywords must be quoted when used as names. Thus, if
you want to look at the definition of the objects function, you must
quote the name "objects", because objects is an inspect keyword.
For a complete description of the quoting rules, type help name within
an inspection session. For a complete list of the keywords, type help
keywords.
214
Interactive Debugging
One important question that arises in the search for bugs is “Which
version of that variable is being used here?” You can answer that
question using the find instruction. For example, consider the
examples fcn.C and fcn.D given in Matching Names and Values on
page 903. We can use find inside the inspector to demonstrate that
the value of x used by fcn.D is not the value defined in fcn.C:
> inspect(fcn.C())
d> resume
d> objects
d> find x
.Data
. . .
215
Chapter 6 Debugging Your Functions
d> objects
d> up
fcn.C (frame 3)
d> objects
d> eval x
[1]
complete [loop | Evaluates to the end of the next for/while/repeat loop, or to the point
function] of function return.
debug.options [echo = With echo=T, expressions are printed before they are evaluated. With
T|F] [marks = marks=hard, evaluation always halts at a marked expression. With
hard|soft] marks=soft it halts only during a resume. Setting marks=soft is
a way of temporarily hiding marks for do, complete, etc. The
defaults are: echo=F, marks=hard. With no arguments,
debug.options displays the current settings.
do [n] Evaluates the next n expressions which are at the same level as the
current one. The default is 1. Thus if evaluation is stopped directly
ahead of a braced group, do does the entire group.
down [n] Changes the local frame for instructions such as objects and eval
to be n frames deeper than the current one. The default is 1. After any
movement of the evaluator (step, resume, etc.), the local frame at
the next stop is that of the function stopped in.
216
Interactive Debugging
fundef [name] Prints the original function definition for name. Default is the current
function. Tracked and marked functions will have modified function
definitions temporarily installed; fundef is used to view the original.
The modified and original versions will behave the same; the
modified copy just incorporates tracing code.
mark Remembers the current expression; evaluation will halt here from
now on.
mark name1 [name2 ...] Arranges to stop in the named functions. The default is to stop at
[at entry|exit] both entry and exit.
show [tracks | marks | Displays installed tracks and marks. Default all.
all]
track name1/ [name2/ Enables or modifies entry and/or exit tracking for the named
... ] [at entry|exit] functions. The default for print is T. You can use any S-PLUS
[print = T|F] [with expression as expr.
expr]
unmark name1/ [name2 Deletes mark points at the named locations in the named functions.
...] [at entry|exit]
217
Chapter 6 Debugging Your Functions
unmark n1 [n2 ...] Deletes mark points n1, n2, .... See mark and show.
up [n] Changes the local frame for instructions such as objects and eval
to be n frames higher than the current one. The default is 1. After any
movement of the evaluator (step, resume, etc.), the local frame at
the next stop is that of the function stopped in.
Controlling Within the inspector, you can control the granularity at which
Evaluation expressions are evaluated. For the finest control, use the step
instruction, which by default, evaluates the next expression or
subexpression. The inspector automatically determines stopping
points before each expression. Issuing the step instruction once takes
you to the next stopping point. To clarify these concepts, consider
again our call to make.data.sets. You can see the current position
using the where instruction:
d> where
The numbered lines in the output from where represent the call stack;
they outline the frame hierarchy. The position is shown by the lines
218
Interactive Debugging
d> step
d> step
You can step over several stopping points by typing an integer after
the step instruction. For example, you could step over the complete
expression
names <- paste("data", 1:n) with the instruction step 2.
return(for(i in 1:n)
{ assign(names[i], runif(100), where = 1 )
}
...
219
Chapter 6 Debugging Your Functions
Entering, By default, inspect lets you step through the expressions in the
Marking, and function being inspected. Function calls within the function begin
debugged are evaluated atomically. However, you can extend the
Tracking step-through capability to such functions using the enter and mark
Functions instructions. You can also monitor calls to a function, without stepping
through them, with the track instruction.
You cannot enter, mark, or track functions that are defined completely by a call to .Internal.
Also, for technical reasons, you cannot enter, mark, or track any of the seven functions listed
below:
220
Interactive Debugging
Entering If you want to step through a function in the current expression, and
Functions don’t plan to step through it if it is called again, use the enter
instruction. For example, while inspecting the call lm(stack.loss
stack.x), you might want to step through the function
model.extract. After stepping to the call to model.extract, you issue
the enter instruction:
d> step
d> enter
Marking To stop in a function each time it is called, use the mark instruction.
Functions For example, the ar.burg function makes several calls to array. If we
want to stop in array while inspecting ar.burg, we issue the mark
instruction and type the name of the function to be marked. By
default, a breakpoint is inserted at the beginning and end of the
function:
entry mark set for array exit mark(s) set for array
. . .
221
Chapter 6 Debugging Your Functions
d> where
d> resume
Marking the You can mark the current expression by giving the mark instruction
Current with no arguments. This sets a breakpoint at the current expression.
This can be useful, for example, if you are inspecting a function with
Expression an extensive loop inside it. If you want to stop at some expression in
the loop each time the loop is evaluated, you can mark the expression.
For example, consider again the bitstring function, defined in
Chapter 4, Writing Functions in S-PLUS. To check the value of n in
each iteration, you could use mark and eval together as follows. First,
start the inspection by calling bitstring, then step to the first
occurrence of the expression i <- i + 1. Issue the mark instruction,
use eval to look at n, then use resume to resume evaluation of the
loop. Each time the breakpoint is reached, evaluation stops. You can
then use eval to check n again:
> inspect(bitstring(107))
222
Interactive Debugging
d>
. . .
d> step
d> mark
d> eval n
[1] 53
d> resume
Viewing and Once you mark an expression, evaluation always stops at that
Removing expression, until you unmark it. The inspector maintains a list of
marks, which you can view with the show instruction:
Marks
d> show marks
Marks: 1
: in array:
data <- as.vector(data)
2 : in aperm:
return(UseMethod("aperm"))
You can remove items from the list using the unmark instruction. With
no arguments, unmark unmarks the current expression. If the current
expression is not marked, you get a warning message. With one or
more integer arguments, unmark unmarks the expressions associated
with the given numbers:
Marks: 1
: in array:
data <- as.vector(data)
2 : in aperm:
return(UseMethod("aperm"))
223
Chapter 6 Debugging Your Functions
d> unmark 2
> inspect(ar.burg(lynx))
224
Interactive Debugging
func.entry.time <-
function(fun)
{
assign("StartTime", proc.time(), frame=1)
cat(deparse(substitute(fun)), "entered at time",
get("StartTime", frame=1), "\n ")
}
func.exit.time <-
function(fun)
{
assign("StopTime", proc.time(), frame=1)
assign("ElTime", get("StopTime", frame=1) -
get("StartTime", frame=1), frame=1)
cat(deparse(substitute(fun)), "took time",
get("ElTime", frame=1), "\n ")
}
> inspect(ar.burg(lynx))
225
Chapter 6 Debugging Your Functions
d> resume
You can suppress the automatic messages entering function fun and
leaving function fun by issuing the track instruction with the flag
print=F. For example, in our previous example, our initial call to
track specified tracking on entry, so only the entry message was
printed. To suppress that message, simply add the flag print=F after
the specification of entry or exit:
Modifying the We have already seen one use of the eval instruction, to examine the
Evaluation objects in the current evaluation frame. More generally, you can use
eval to evaluate any S-PLUS expression. In particular, you can modify
Frame values in the current evaluation frame, with those values then being
used in the subsequent evaluation of the function being debugged.
Thus, if you discover where your error occurs, you can modify the
offending expression, evaluate it, and assign the appropriate value in
the current frame. If the fix works, the complete evaluation should
give the correct results. Of course, you still need to make the change
(with the fix function) in the actual function. But using eval provides
a useful testing tool inside the inspector. For example, once we have
226
Interactive Debugging
> inspect(make.data.sets(5))
d> step 2
d> objects
[1] "data 1" "data 2" "data 3" "data 4" "data 5"
Here we see that the names are not what we wanted. To test our
assumption that we need the sep="" argument, use eval as follows:
Our change has given the correct names; now resume evaluation and
see if the data sets are actually created:
d> resume
> data1
227
Chapter 6 Debugging Your Functions
Error Actions When an error occurs in the function being inspected, inspect calls
in the the current error.action. By default, this action has three parts, as
follows:
Inspector
1. Produce a traceback of the sequence of function calls at the
time of the error.
2. Dump the frames existing at the time of the error.
3. Start a restricted version of inspect that allows you to
examine frames and evaluate expressions, but not proceed
with further evaluation of the function being inspected.
Thus, you can examine the evaluation frame and the objects within it
at the point the error occurred. You can use the up and down
instructions to change frames, and the objects, find, on.exit, and
return.value instructions to examine the contents of the frames. The
instructions eval, fundef, help, and quit are also available in the
restricted version of inspect. For example, consider the primes
function described in Chapter 4, Writing Functions in S-PLUS. We can
introduce an error by commenting out the line that defines the
variable smallp:
primes <-
function(n = 100)
{
n <- as.integer(abs(n))
if(n < 2)
return(integer(0))
p <- 2:n
# smallp <- integer(0)
#
# the sieve
repeat
{ i <- p[1]
smallp <- c(smallp, i)
p <- p[p %% i != 0]
if(i > sqrt(n))
break
}
c(smallp, p)
}
228
Interactive Debugging
> inspect(primes())
d> do 2
d> do
d> quit
> inspect(primes())
d> do 2
229
Chapter 6 Debugging Your Functions
repeat {
i <- p[1]
smallp <- c(smallp, i)
...
You can then edit the primes function to fix the error.
Limitations of Inspect
230
Other Debugging Tools
Using the The browser function is useful for debugging functions when you
S-PLUS Browser know an error occurs after some point in the function. If you insert a
call to browser into your function at that point, you can check all
Function assignments up to that point, and verify that they are indeed the
correct ones. For example, to return to our make.data.sets example,
we could have replaced our original cat statement with a call to
browser:
make.data.sets <-
function(n)
{
names <- paste("data", 1:n)
browser()
for(i in 1:n)
{ assign(names[i], runif(100), where = 1)
}
}
> make.data.sets(5)
Called from: make.data.sets(5)
b(make.data.sets)>
Type ? at the prompt to get brief help on the browser, plus a listing of
the variables in the local frame:
b(make.data.sets)> ?
Type any expression. Special commands:
`up', `down' for navigation between frames.
`c' # exit from the browser & continue
`stop' # stop this whole task
231
Chapter 6 Debugging Your Functions
b(make.data.sets)> q
>
Using the If a function is broken, so that it returns an error reliably when called,
S-PLUS there is an alternative to all those cat and browser statements: the
debugger function. To use debugger on a function, you must have the
Debugger function’s list of frames dumped to disk. You can do this in several
ways:
• Call dump.frames() from within the function.
• Call dump.frames() from the browser.
• Set options(error=expression(dump.frames())) If you use
this option, you should reset it to the default
(expression(dump.calls())) when you are finished
debugging, because dumped frames can be quite large.
Then, when an error occurs, you can call the debugger function with
no arguments, which in turn uses the browser function to let you
browse through the dumped frames of the broken function. Use the
usual browser commands (?, up, down, and frame numbers) to move
through the dumped frames.
For example, consider the following simple function:
debug.test <-
function()
{
x <- 1:10
232
Other Debugging Tools
sin(z)
}
This has an obvious error in the second line of the body, so it will fail
if run. To use debugger on this function, do the following:
> options(error=expression(dump.frames()))
> debug.test()
Problem in debug.test(): Object "z" not found
Evaluation frames saved in object "last.dump", use
debugger() to examine them
> debugger()
Message: Problem in debug.test(): Object "z" not found
browser: Frame 11
b(sin)>
You are now in the browser, and can view the information in the
dumped frames as described above.
Tracing Another way to use the browser function is with the trace function,
Function which modifies a specified function so that some tracing action is
taken whenever that function is called. You can specify that the action
Evaluation be to call the browser function (with the statement tracer = browser)
providing yet another way to track down bugs.
Do not use trace on any function if you intend to do your debugging with inspect.
> trace(make.data.sets,browser)
> make.data.sets
function(n) {
if(.Traceon)
{ .Internal(assign(".Traceon", F, where = 0),
"S_put")
cat("On entry: ")
browser()
.Internal(assign(".Traceon", T, where = 0),
"S_put")
233
Chapter 6 Debugging Your Functions
} else
{ names <- paste("data", 1:n)
for(i in 1:n)
{ assign(names[i], runif(100), where = 1)
}
}
}
> make.data.sets(3)
On entry: Called from: make.data.sets(3)
b(2)> ?
1: n
b(2)>
234
Other Debugging Tools
> trace(make.data.sets,browser,at=2)
> make.data.sets
function(n) {
names <- paste("data", 1:n)
{ if(.Traceon)
{ .Internal(assign(".Traceon", F,
where = 0), "S_put")
cat("At 2: ")
browser()
.Internal(assign(".Traceon", T,
where = 0), "S_put")
}
for(i in 1:n)
{ assign(names[i], runif(100), where = 1)
}
}
}
> make.data.sets(3)
At 2: Called from: make.data.sets(3)
b(2)> ?
1: names
2: n
b(2)>
235
Chapter 6 Debugging Your Functions
236
EDITABLE GRAPHICS
COMMANDS
Introduction
7
239
Getting Started 241
Graphics Objects 244
Graph Sheets 244
Graphs 244
Axes 245
Plots 245
Annotations 245
Object Path Names 246
Graphics Commands 249
Plot Types and Plot Classes 249
Viewing Argument Lists and Online Help 252
Specifying Data 253
Display Properties 254
Displaying Dialogs 257
Plot Types 258
The Plots2D and ExtraPlots Palettes 258
The Plots3D Palette 272
Titles and Annotations 281
Titles 281
Legends 281
Other Annotations 282
Locating Positions on a Graph 285
Formatting Axes 287
Formatting Text 289
Modifying the Appearance of Text 290
Superscripts and Subscripts 291
Greek Text 291
Colors 292
237
Chapter 7 Editable Graphics Commands
238
Introduction
INTRODUCTION
Chapter 3 through Chapter 6 in the User’s Guide introduces the
editable graphics system that is part of the S-PLUS graphical user
interface. As the chapters are part of the User’s Guide, they focus on
creating and customizing editable graphics via the point-and-click
approach. In this chapter, we show how to create and modify such
graphics by calling S-PLUS functions directly. All of the graphics
available in the Plots2D, Plots3D, and ExtraPlots palettes can be
generated by pointing and clicking, or by typing commands in the
Script and Commands windows. Likewise, editable graphs can be
modified by using the appropriate dialogs and the Graph toolbar, or
by calling functions that make the equivalent modifications.
Note
The graphics produced by the Statistics menus and dialogs are traditional graphics. See Chapter
8 and Chapter 9 in the Programmer’s Guide for details.
239
Chapter 7 Editable Graphics Commands
240
Getting Started
GETTING STARTED
The guiPlot function emulates the action of interactively creating
plots by first selecting columns of data and then clicking on a button
in a plot palette. The colors, symbol types, and line styles used by
guiPlot are equivalent to those specified in both the Options
Graph Styles dialogs and the individual graphics dialogs. The
arguments to guiPlot are:
> args(guiPlot)
function(PlotType = "Scatter", NumConditioningVars = 0,
Multipanel = "Auto", GraphSheet = "", AxisType = "Auto",
Projection = F, Page = 1, Graph = "New", Rows = "",
Columns = "", ...)
> guiGetPlotClass()
[1] "Scatter" "Isolated Points"
[3] "Bubble" "Color"
[5] "Bubble Color" "Text as Symbols"
[7] "Line" "Line Scatter"
[9] "Y Series Lines" "XY Pair Lines"
[11] "Y Zero Density" "Horiz Density"
[13] . . .
241
Chapter 7 Editable Graphics Commands
The following call places the plots in two separate panels that have
the same x axis scaling but different y axis scaling:
242
Getting Started
243
Chapter 7 Editable Graphics Commands
GRAPHICS OBJECTS
There are five main types of graphics objects in the editable graphics
system: graph sheets, graphs, axes, plots, and annotations. Plots are
contained in graphs, and graphs are contained in graph sheets. Most
graphics objects cannot exist in isolation. If a graphics object is
created in isolation, it generates an appropriate container. For
example, when you create a plot, the appropriate graph, axes and
graph sheet are automatically configured and displayed.
In general, the simplest way to create plots is with guiPlot. You can
create all types of graphics objects with the guiCreate function. The
properties of graphics objects can be modified using the guiModify
function. In this section, we briefly describe each of the graphics
objects; the section Graphics Commands on page 249 discusses
guiPlot, guiCreate, and guiModify in more detail.
Graph Sheets Graph sheets are the highest-level graphics object. They are documents
that can be saved, opened, and exported to a wide variety of graphics
formats. Graph sheet properties determine the orientation and shape of
the graph sheet, the units on the axes, the default layout used when
new graphs are added, and any custom colors that are available for
other objects. Graph sheets typically contain one or more graphs in
addition to annotation objects such as text, line segments, arrows, and
extra symbols.
Graphs There are six types of graphs in the editable graphics system: 2D, 3D,
Matrix, Smith, Polar, and Text. The graph type determines the
coordinate system used within the graph:
• A 2D graph can have one or more two-dimensional
coordinate systems, each composed of an x and y axis.
• A 3D graph has a single three-dimensional coordinate system
defined by a 3D axes object.
• A Matrix graph has a set of two-dimensional coordinate
systems drawn in a matrix layout.
• Smith plots are specialized graphs used in microwave
engineering that have a single two-dimensional coordinate
system.
244
Graphics Objects
Axes The characteristics of the coordinate systems within graphs are set by
the properties of axes objects. Typically, axes properties contain
information about the range, tick positions, and display characteristics
of an axis, such as line color and line weight. Axes for 2D graphs also
have properties that determine scaling and axis breaks. All axes other
than those for 2D graphs contain information about tick labels and
axis titles; 2D axes contain separate objects for tick labels and axis
titles, both of which have their own properties.
Plots A plot contains data specifications and options relating to how the
data are displayed. In many cases, a plot determines the type of
calculations that S-PLUS performs on the data before drawing the plot.
A plot is always contained within a graph and is associated with a
particular type of coordinate system. For example, a 2D graph can
contain any of the following plot types, among others: bar charts, box
plots, contour plots, histograms, density plots, dot charts, line plots,
and scatter plots. Plot properties are components that describe aspects
of the plot such as the line style and color.
245
Chapter 7 Editable Graphics Commands
Object Path Every graph object in S-PLUS has a unique path name that identifies
Names it. A valid path name has the following components:
• The first component is the name of the graph sheet preceded
by $$.
• The name of the graph sheet is followed by the graph number
or annotation number.
• The name of the graph is followed by the plot number, axis
number, or annotation number.
• The name of an annotation can be followed by numbers that
correspond to specific components. For example, legends are
annotations that can contain legend items, which control the
display of individual entries in a legend.
• In 2D graphics, the name of an axis can be followed by
numbers that correspond to tick labels or axis titles.
• The name of some plots can be followed by numbers that
correspond to particular plot components. For example,
confidence intervals are components that are associated with
specific curve fit plots.
The components in the path name for a graph object are separated by
dollar signs. You can think of the individual components as
containers. For example, plots are contained within graphs, and
graphs are contained within graph sheets; therefore, the path name
$$GS1$1$1 refers to the first plot in the first graph of the graph sheet
named GS1. Likewise, annotations can be contained within graphs, so
the path name $$GS1$1$1 can also refer to the first annotation in the
first graph of GS1. Figure 7.1 visually displays this hierarchy of object
path names.
If a path name does not include the name of a graph sheet, S-PLUS
assumes it refers to the current graph sheet instead. The current graph
sheet is the one that was most recently created, modified, or viewed.
246
Graphics Objects
Graph Sheet
Annotation Graph
Figure 7.1: Hierarchy of graph objects in path names. Each node in the tree can be a
component of a path name.To construct a full path name for a particular type of
graph object, follow a branch in the tree and place dollar signs between the names in
the branch.
You can use the following functions to obtain path names for specific
types of graph objects. Most of the functions accept a value for the
GraphSheet argument, which is a character vector giving the name of
the graph sheet. By default, GraphSheet="" and the current graph
sheet is used.
• guiGetAxisLabelsName: Returns the path name of the tick
labels for a specified axis. By default, S-PLUS returns the path
name of the labels for axis 1, which is the first x axis in the first
plot on the graph sheet.
• guiGetAxisName: Returns the path name of a specified axis.
By default, the path name for axis 1 is returned.
• guiGetAxisTitleName: Returns the path name of the title for a
specified axis. By default, the path name of the title for axis 1
is returned.
247
Chapter 7 Editable Graphics Commands
248
Graphics Commands
GRAPHICS COMMANDS
This section describes the programming interface to the editable
graphics system. The three main functions we discuss are guiPlot,
guiCreate, and guiModify. You can use guiPlot and guiCreate to
draw graphics and guiModify to change particular properties about
your plots. For detailed descriptions of the plot types and their GUI
options, see the User’s Guide.
Throughout this chapter, we emphasize using guiPlot over
guiCreate to generate editable graphics. This is primarily because
guiPlot is easier to learn for basic plotting purposes. In this section,
however, we provide examples using both guiPlot and guiCreate.
The main differences between the two functions are:
• The guiPlot function is used exclusively for editable
graphics, while guiCreate can be used to create other GUI
elements such as new Data windows and Object Explorer
pages.
• The guiPlot function accepts a plot type as an argument while
guiCreate accepts a plot class. We discuss this distinction more
in the subsection below.
• Calls to guiPlot are recorded in the condensed History Log
while calls to guiCreate are recorded in the full History Log.
If you are interested solely in the editable graphics system, we
recommend using guiPlot to create most of your plots. If you are
interested in programmatically customizing the S-PLUS graphical user
interface, using guiCreate to generate graphics may help you become
familiar with the syntax of the required function calls.
Plot Types and S-PLUS includes a large number of editable plot types, as evidenced
Plot Classes by the collective size of the three plot palettes. Plot types are
organized into various plot classes, so that the plots in a particular class
share a set of common properties. To see a list of all classes for the
S-PLUS graphical user interface (of which the plot classes are a subset),
use the guiGetClassNames function.
249
Chapter 7 Editable Graphics Commands
> guiGetClassNames()
See the section Plot Types on page 258 for comprehensive lists of
plots and their corresponding plot classes. Table 7.1 lists the most
common classes for the remaining by graph objects (graph sheets,
graphs, axes, and annotations).
Table 7.1: Common classes for graph objects. This table does not include plot classes.
250
Graphics Commands
For example, Line, Scatter, and Line Scatter plots are all members
of the plot class LinePlot. You can create a scatter plot easily with
either guiPlot or guiCreate as follows:
Note that guiPlot accepts the plot type Line Scatter as its first
argument while guiCreate accepts the plot class LinePlot. The
guiCreate arguments DataSet, xColumn, and yColumn all define
properties of a LinePlot graphic; they correspond the first three
entries on the Data to Plot page of the Line/Scatter Plot dialog.
To create a line plot with symbols using all of the default values, type:
Similarly, you can create a line plot without symbols using either of
the following commands:
251
Chapter 7 Editable Graphics Commands
Viewing You can obtain on-line help for guiPlot using the help function just
Argument Lists as you would for any other built-in command. The help files for
guiCreate and guiModify are structured by class name, however.
and Online Typing help("guiCreate") displays a short, general help file; to see a
Help detailed help page, you must also include the class name. For
example, to see help on the LinePlot class, type:
> help("guiCreate(\"LinePlot\")"
> guiPrintClass("LinePlot")
CLASS: LinePlot
ARGUMENTS:
Name
Prompt: Name
Default: ""
DataSet
Prompt: Data Set
Default: "fuel.frame"
xColumn
Prompt: x Columns
Default: ""
yColumn
Prompt: y Columns
Default: ""
zColumn
Prompt: z Columns
Default: ""
. . .
252
Graphics Commands
The Prompt value gives the name of the field in the Line/Scatter
Plot dialog that corresponds to each argument. The Default entry
gives the default value for the argument, and Option List shows the
possible values the argument can assume.
The argument lists for guiCreate and guiModify are also organized
by class name. Instead of using the args function to see a list of
arguments, use the guiGetArgumentNames function. For example, the
following command lists the arguments and properties that you can
specify for the LinePlot class:
> args(guiModify)
function(classname, GUI.object, ...)
Specifying You can specify data for plots either by name or by value. The examples
Data so far in this section illustrate the syntax for specifying data by name.
The commands in the examples all refer to data sets and their
253
Chapter 7 Editable Graphics Commands
guiPlot("Scatter", DataSetValues =
fuel.frame[, c("Mileage","Weight")])
guiCreate("LinePlot",
xValues = fuel.frame$Mileage,
yValues = fuel.frame$Weight)
If you generate plots from within a function, you may want to pass the
data by value if you construct the data set in the function as well.
S-PLUS erases the data upon termination of the function. Therefore,
any graphs the function generates by passing the data by name will be
empty.
254
Graphics Commands
LineColor Color of the lines drawn "Transparent", "Black", "Blue", "Green", "Cyan",
between data points in the "Red", "Magenta", "Brown", "Lt Gray",
plot. Accepts a character "Dark Gray", "Lt Blue", "Lt Green", "Lt Cyan",
vector naming the color. "Lt Red", "Lt Magenta", "Yellow",
"Bright White", "User1", "User2", ..., "User16".
LineStyle Style of the lines drawn "None", "Solid", "Dots", "Dot Dash",
between data points in the "Short Dash", "Long Dash", "Dot Dot Dash",
plot. Accepts a character "Alt Dash", "Med Dash", "Tiny Dash".
vector naming the style.
SymbolColor Color of the symbols used Identical to the settings for LineColor.
to plot the data points.
Accepts a character vector
naming the color.
SymbolStyle Style of the symbols used to Integer values: 0,1, 2, ..., 27.
plot the data points.
Corresponding character values:
Accepts either an integer
"None"; "Circle, Solid"; "Circle, Empty";
value representing the style
"Box, Solid"; "Box, Empty";
or a character vector
"Triangle, Up, Solid"; "Triangle, Dn, Solid";
naming it.
"Triangle, Up, Empty"; "Triangle, Dn, Empty";
"Diamond, Solid"; "Diamond, Empty"; "Plus";
"Cross"; "Ant"; "X"; "-"; "|"; "Box X"; "Plus X";
"Diamond X"; "Circle X"; "Box +"; "Diamond +";
"Circle +"; "Tri. Up Down"; "Tri. Up Box";
"Tri. Dn Box"; "Female"; "Male".
255
Chapter 7 Editable Graphics Commands
256
Graphics Commands
Because you can pass each of the properties in Table 7.2 to guiCreate
as well as to guiModify, you can also draw the plot using a single call
to guiCreate:
Displaying You can use the guiDisplayDialog function to open the property
Dialogs dialog for a particular graph object. For example, the following
command displays the dialog for the current plot of class LinePlot:
The properties for the plot may be modified using the dialog that
appears.
257
Chapter 7 Editable Graphics Commands
PLOT TYPES
The S-PLUS editable graphics system has a wide variety of available
plot types. In this section, we present guiPlot commands you can use
to generate each type of plot. The plots are organized first by palette
(Plots2D, ExtraPlots, and Plots3D) and then by plot class. We
discuss commands for customizing axes and layout operations in a
later section. For additional details on any of the plot types, see the
User’s Guide.
As we mention in the section Getting Started on page 241, you can
use the guiGetPlotClass function to see a list of all plot types that
guiPlot accepts. Once you know the name of a particular plot type,
you can also use guiGetPlotClass to return its class. For example, the
Bubble plot type belongs to the LinePlot class:
> guiGetPlotClass("Bubble")
[1] "LinePlot"
Knowing both the type and class for a particular plot allows you to
use guiPlot, guiCreate, and guiModify interchangeably.
The Plots2D The Plots2D and ExtraPlots palettes contain a collection of two-
and ExtraPlots dimensional plots. Table 7.3 shows a quick description of the plot
classes and the plots that belong to each of them.
Palettes
Table 7.3: The plot types available in the Plots2D and ExtraPlots palettes. The left column of the table gives
the class that each plot type belongs to.
LinePlot Line and scatter plots. Scatter, Line, Line Scatter, Isolated Points,
Text as Symbols, Bubble, Color, Bubble Color,
Vert Step, Horiz Step, XY Pair Scatters, XY
Pair Lines, High Density, Horiz Density, Y
Zero Density, Robust LTS, Robust MM, Loess,
Spline, Supersmooth, Kernel, Y Series Lines,
Dot.
LinearCFPlot Linear curve fit plots. Linear Fit, Poly Fit, Exp Fit, Power Fit, Ln
Fit, Log10 Fit.
258
Plot Types
Table 7.3: The plot types available in the Plots2D and ExtraPlots palettes. The left column of the table gives
the class that each plot type belongs to.
BarPlot Bar plots. Bar Zero Base, Bar Y Min Base, Grouped Bar,
Stacked Bar, Horiz Bar, Grouped Horiz Bar,
Stacked Horiz Bar, Bar with Error, Grouped
Bar with Error.
ErrorBarPlot Error bar plots. Error Bar, Horiz Error Bar, Error Bar - Both.
259
Chapter 7 Editable Graphics Commands
Table 7.3: The plot types available in the Plots2D and ExtraPlots palettes. The left column of the table gives
the class that each plot type belongs to.
The LinePlot The LinePlot class includes various kinds of line and scatter plots.
Class The scatter plot is the fundamental visual technique for viewing and
exploring relationships in two-dimensional data. Its extensions
include line plots, text plots, bubble plots, step plots, robust linear fits,
smooths, and dot plots. The line and scatter plots we illustrate here
are the most basic types of plots for displaying data. You can use
many of them to plot a single column of data as well as one data
column against another.
Scatter plot
guiPlot("Scatter", DataSetValues =
data.frame(util.mktbook, util.earn))
Line plot
guiPlot("Line", DataSetValues =
data.frame(util.mktbook, util.earn))
260
Plot Types
Bubble plot
guiPlot("Bubble", DataSetValues =
data.frame(util.mktbook, util.earn, 1:45))
Color plot
guiPlot("Color", DataSetValues =
data.frame(util.mktbook, util.earn, 1:45))
guiPlot("BubbleColor", DataSetValues =
data.frame(util.mktbook, util.earn, 45:1, 1:45))
261
Chapter 7 Editable Graphics Commands
Loess smooth
guiPlot("Loess", DataSetValues =
data.frame(util.mktbook, util.earn))
Smoothing spline
guiPlot("Spline", DataSetValues =
data.frame(util.mktbook, util.earn))
Friedman’s supersmoother
guiPlot("Supersmooth", DataSetValues =
data.frame(util.mktbook, util.earn))
Kernel smooth
guiPlot("Kernel", DataSetValues =
data.frame(util.mktbook, util.earn))
262
Plot Types
Y series lines
Dot plot
guiPlot("Dot", DataSetValues =
data.frame(NumCars = table(fuel.frame$Type),
CarType = levels(fuel.frame$Type)))
The LinearCFPlot The linear, polynomial, exponential, power, and logarithmic curve
Class fits all have class LinearCFPlot. Curve-fitting plots in this class display
a regression line with a scatter plot of the associated data points. The
curves are computed with an ordinary least-squares algorithm.
Linear fit
Polynomial fit
Exponential fit
Power fit
263
Chapter 7 Editable Graphics Commands
The The NonlinearCFPlot class includes a single plot type for fitting
NonlinearCFPlot nonlinear curves. In addition to the data, this type of plot needs a
Class formula and a vector of initial values for any specified parameters.
For this reason, it is usually easier to create the plot with a single call
to guiCreate, rather than sequential calls to guiPlot and guiModify.
Nonlinear fit
The MatrixPlot The MatrixPlot class includes a single plot type for displaying
Class scatterplot matrices. This type of plot displays an array of pairwise
scatter plots illustrating the relationship between any pair of variables
in a data set.
Scatterplot matrix
The BarPlot Class A wide variety of bar plots are available in the editable graphics
system via the BarPlot class. A bar plot displays a bar for each point in
a set of observations, where the height of a bar is determined by the
value of the data point. For most ordinary comparisons, we
recommend the simplest bar plot with the zero base. For more
complicated analysis, you may wish to display grouped bar plots,
stacked bar plots, or plots with error bars.
264
Plot Types
The HiLowPlot The HiLowPlot class contains two types of plots: the high-low plot and
Class the candlestick plot. A high-low plot typically displays lines indicating
the daily, monthly, or yearly extreme values in a time series. These
265
Chapter 7 Editable Graphics Commands
kinds of plots can also include average, opening, and closing values,
and are referred to as high-low-open-close plots in these cases.
Meaningful high-low plots can thus display from three to five
columns of data, and illustrate simultaneously a number of important
characteristics about time series data. Because of this, they are most
often used to display financial data.
One variation on the high-low plot is the candlestick plot. Where
typical high-low plots display the opening and closing values of a
financial series with lines, candlestick plots use filled rectangles. The
color of the rectangle indicates whether the difference is positive or
negative. In S-PLUS, cyan rectangles represent positive differences,
when closing values are larger than opening values. Dark blue
rectangles indicate negative differences, when opening values are
larger than closing values.
High-low-open-close plot
Candlestick plot
The BoxPlot Class The BoxPlot class contains box plots that show the center and spread
of a data set as well as any outlying data points. In the editable
graphics system, box plots can be created for a single variable or a
grouped variable.
266
Plot Types
The AreaPlot The AreaPlot class contains a single plot type that displays area plots.
Class An area chart fills the space between adjacent series with color. It is
most useful for showing how each series in a data set affects the whole
over time.
Area plot
guiPlot("Area", DataSetValues =
data.frame(car.time, car.gals))
The QQPlot Class The QQPlot class produces quantile-quantile plots, or qqplots, which
are extremely powerful tools for determining good approximations to
the distributions of data sets. In a one-dimensional qqplot, the
ordered data are graphed against quantiles of a known theoretical
distribution. If the data points are drawn from the theoretical
distribution, the resulting plot is close to the line y = x in shape. The
normal distribution is often the distribution used in this type of plot,
giving rise to the plot type "QQ Normal". In a two-dimensional qqplot,
the ordered values of the variables are plotted against each other. If
the variables have the same distribution shape, the points in the
qqplot cluster along a straight line.
QQ normal plot
QQ plot
267
Chapter 7 Editable Graphics Commands
The PPPlot Class The PPPlot class produces probability plots. A one-dimensional
probability plot is similar to a qqplot except that the ordered data
values are plotted against the quantiles of a cumulative probability
distribution function. If the hypothesized distribution adequately
describes the data, the plotted points fall approximately along a
straight line. In a two-dimensional probability plot, the observed
cumulative frequencies of both sets of data values are plotted against
each other; if the data sets have the same distribution shape, the
points in the plot cluster along the line y = x .
PP normal plot
PP plot
The ParetoPlot The ParetoPlot class displays Pareto charts, which are essentially
Class specialized histograms. A Pareto chart orders the bars in a histogram
from the most frequent to the least frequent, and then overlays a line
plot to display the cumulative percentages of the categories. This type
of plot is most useful in quality control analysis, where it is generally
helpful to focus resources on the problems that occur most frequently.
In the examples below, we use the data set exqcc2 that is located in
the samples\Documents\exqcc2.sdd file under your S-PLUS home
directory.
268
Plot Types
data.restore(paste(getenv("SHOME"),
"samples/Documents/exqcc2.sdd", sep = "/"))
guiPlot("Pareto", DataSet = "exqcc2",
Columns = "NumSample, NumBad")
The Histogram The Histogram class creates histograms and density plots for one-
Class dimensional data. Histograms display the number of data points that
fall in each of a specified number of intervals. A density plot displays
an estimate of the underlying probability density function for a data
set and allows you to approximate the probability that your data fall
in any interval. A histogram gives an indication of the relative density
of the data points along the horizontal axis. For this reason, density
plots are often superposed with (scaled) histograms.
Histogram
Density plot
The PiePlot Class The PiePlot class displays pie charts, which show the share of
individual values in a variable relative to the sum total of all the
values. The size of a pie wedge is relative to a sum, and does not
directly reflect the magnitude of the data value. Because of this, pie
charts are most useful when the emphasis is on an individual item’s
relation to the whole; in these cases, the sizes of the pie wedges are
naturally interpreted as percentages.
269
Chapter 7 Editable Graphics Commands
Pie chart
guiPlot("Pie", DataSetValues =
data.frame(table(fuel.frame$Type)))
The ErrorBarPlot The ErrorBarPlot class includes error bar plots, which display a range
Class of error around each plotted data point.
The ContourPlot The ContourPlot class displays contour plots and level plots. A
Class contour plot is a representation of three-dimensional data in a flat, two-
dimensional plane. Each contour line represents a height in the z
direction from the corresponding three-dimensional surface. A level
plot is essentially identical to a contour plot, but it has default options
that allow you to view a particular surface differently.
Contour plot
Level plot
270
Plot Types
The VectorPlot The VectorPlot class contains the vector plot type, which uses arrows
Class to display the direction and velocity of flow at particular positions in a
two-dimensional plane. To create a vector plot, specify two columns
of data for the positions of the arrows, a third column of data for the
angle values (direction), and a fourth column of data for the
magnitude (length). In the example below, we use the data set
exvector that is located in the samples\Documents\exvector.sdd
file under your S-PLUS home directory.
Vector plot
data.restore(paste(getenv("SHOME"),
"samples/Documents/exvector.sdd", sep = "/"))
guiPlot("Vector", DataSet = "exvector",
Columns = "x, y, angle, mag")
The The CommentPlot class contains the comment plot type, which displays
CommentPlot character labels on a two-dimensional graph. You can use comment
Class plots to display character data, plot combinations of characters as
symbols, produce labeled scatter plots, and create tables. To create a
comment plot, specify two columns of data for the position of each
comment and a third column for the text.
Comment plot
guiPlot("Comment", DataSetValues =
data.frame(x = 1:26, y = rnorm(26), z = LETTERS))
The SmithPlot The SmithPlot class contains Smith plots, which are drawn in polar
Class coordinates. This type of plot is often used in microwave engineering
to show impedance characteristics. There are three types of Smith
plots: reflection, impedance, and circle. In a reflection plot, the x
values are magnitudes in the range [0,1] and the y values are angles
in degrees that are measured clockwise from the horizontal. In an
impedance plot, the x values are resistance data and the y values are
reactance data. In a circle plot, the x values are positive and specify
the distance from the center of the Smith plot to the center of the
circle you want to draw. The y values are angles that are measured
clockwise from the horizontal; the z values are radii and must also be
positive.
271
Chapter 7 Editable Graphics Commands
Smith plots
# Reflection plot.
guiPlot("Smith", DataSetValues =
data.frame(x = seq(from=0, to=1, by=0.1), y = 0:10))
guiModify("SmithPlot", Name = guiGetPlotName(),
AngleUnits = "Radians")
# Impedance plot.
guiPlot("Smith", DataSetValues =
data.frame(x = seq(from=0, to=1, by=0.1), y = 0:10))
guiModify("SmithPlot", Name = guiGetPlotName(),
DataType = "Impedance", AngleUnits = "Radians")
# Circle plot.
guiPlot("Smith", DataSetValues =
data.frame(x = seq(from=0, to=1, by=0.1), y = 0:10,
z = seq(from=0, to=1, by=0.1)))
guiModify("SmithPlot", Name = guiGetPlotName(),
DataType = "Circle", AngleUnits = "Radians")
The PolarPlot The PolarPlot class displays line and scatter plots in polar
Class coordinates. To create a polar plot, specify magnitudes for the x values
in your data and angles (in radians) for the y values.
272
Plot Types
The last nine plots in the Plots3D palette are composite plots that do
not have their own classes. Instead, they are tools that allow you to
view plots we’ve discussed already in new and different ways. The
tools fall into two broad categories: rotated plots and conditioned plots.
We discuss each of these categories below.
Table 7.4: The plot types available in the Plots3D palette. The left column of the table gives the class that
each plot type belongs to.
Line3DPlot Line, scatter, drop- 3D Scatter, 3D Line, 3D Line Scatter, Drop Line
line, and regression Scatter, 3D Regression, 3D Reg Scatter.
plots.
SurfacePlot Surface and bar plots. Coarse Surface, Data Grid Surface, Spline Surface,
Filled Coarse Surface, Filled Data Grid Surface,
Filled Spline Surface, 8 Color Surface, 16 Color
Surface, 32 Color Surface, 3D Bar.
Grid3D Projection planes. This group of plots does not have formal plot types. The
plots are listed in the Plots3D palette with the following
names:
XY Plane Z Min, XZ Plane Y Min, YZ Plane X Min, XY
Plane Z Max, XZ Plane Y Max, YZ Plane X Max.
Rotated plots. This group of plots has neither a plot class nor a
corresponding formal plot type. The plots are listed in
the Plots3D palette with the following names:
2 Panel Rotation, 4 Panel Rotation, 6 Panel Rotation.
273
Chapter 7 Editable Graphics Commands
Table 7.4: The plot types available in the Plots3D palette. The left column of the table gives the class that
each plot type belongs to.
Conditioned plots. This group of plots has neither a plot class nor a
corresponding formal plot type. The plots are listed in
the Plots3D palette with the following names:
Condition on X, Condition on Y, Condition on Z, No
Conditioning, 4 Panel Conditioning, 6 Panel
Conditioning.
x <- ozone.xy$x
y <- ozone.xy$y
z <- ozone.median
ozone.df <- data.frame(x,y,z)
To familiarize yourself with this data set and the 3D plot types, first
create a mesh surface plot:
The Data Grid Surface is the first plot in the first graph of the graph
sheet. We give the plot of data points the name 1$2 to designate it as
the second plot in the first graph. For more details on naming
conventions for graph objects, see the section Object Path Names on
page 246.
You can use guiModify to rotate the axes:
274
Plot Types
Note that Rotate3Daxes is part of the properties for the graph type
Graph3D and not the plot type Line3DPlot; see the section Graphics
Objects on page 244 for details.
If you would like to see the surface again without the overlaid data
points, use the guiRemove function to remove the second plot:
The Line3DPlot The Line3DPlot class contains scatter and line plots that display
Class multidimensional data in three-dimensional space. Typically, static
3D scatter and line plots are not effective because the depth cues of
single points are insufficient to give strong 3D effects. On some
occasions, however, they can be useful for discovering simple
relationships between three variables. To improve the depth cues in a
3D scatter plot, you can add drop lines to each of the points; this gives
rise to the plot type "Drop Line Scatter". The 3D Regression plot
draws a regression plane through the data points.
Scatter plot
Line plot
Regression plot
275
Chapter 7 Editable Graphics Commands
The SurfacePlot The SurfacePlot class includes different types of surface plots, which
Class are approximations to the shapes of three-dimensional data sets.
Spline surfaces are smoothed plots of gridded 3D data, and 3D bar
plots are gridded surfaces drawn with bars. For two variables, a 3D
bar plot produces a binomial histogram that shows the joint
distribution of the data. A color surface plot allows you to specify
color fills for the bands or grids in your surface plot.
Coarse surface
Spline surface
276
Plot Types
Bar plot
The ContourPlot The 3D contour plots are identical to 2D contour plots, except that
Class the contour lines are drawn in three-dimensional space instead of on
a flat plane. For more details, see the section The ContourPlot Class
on page 270.
Contour plot
The Grid3D Class The Grid3D class contains a set of two-dimensional planes you can use
either on their own or overlaid on other 3D plots. The class is
separated into six plots according to which axis a plane intersects and
where. For example, the plot created by the XY Plane Z Min button
in the Plots3D palette intersects the z axis at its minimum.
The plots in the Grid3D class do not have their own plot types.