R Graphics
Hukum Chandra
ICAR-National Fellow & Principal Scientist
Email: [email protected]
ICAR-Indian Agricultural Statistics Research Institute
Library Avenue, PUSA, New Delhi, India
www.iasri.res.in
Outline
Introduction to Graphics in R
Examples of commonly used graphics functions
Common options for customizing graphs
High-Level Plot Functions
Low-Level Plot Functions
Saving Graphs
2
Simple Graphics
Graphics - one of the most important aspects of presentation and
analysis of data is generation of proper graphics
Graphic features of a data can be viewed very effectively using R
R is capable of creating high quality graphics
Graphs are typically created using a series of high-level and low-
level plotting commands
High-level functions create new plots and low-level functions add
information to an existing plot
Customize graphs (line style, symbols, color, etc) by specifying
graphical parameters
Specify graphic options using the par() function
3
Graphic Parameters
The function par() is used to set or get graphical parameters
This function contains 70 possible settings and allows you to
adjust almost any feature of a graph
Graphic parameters are reset to the defaults with each new
graphic device
Most elements of par() can be set as additional arguments to a
plot command, however there are some that can only be set by a
call to par(),mfrow, mfcol see the documentation for others
4
High-Level Plot Functions
5
Low-Level Plot Functions
6
Scatterplot and Line Graphs
Scatter plots: are useful for studying dependencies between variables.
The plot() function is used for producing scatterplots and line graphs
See ? plot
Using the plot command
x <- seq(0,10,0.2)
y <- sqrt(x)
plot(x,y); grid()
As one might guess, the last command adds a grid to the plot.
7
8
plot(x,y); grid()
plot(x,y, type="b", col="blue", lwd=1, lty=4, pch=5, main="My plot",
xlab="x axis", ylab="y axis")
grid(col="red")
9
Common arguments for plot()
type 1-character string denoting the plot type
xlim x limits, c(x1, x2)
ylim y limits, c(y1, y2)
main Main title for the plot
sub Sub title for the plot
xlab x-axis label
ylab y-axis label
col Color for lines and points, either a character string or a number
that indexes the palette()
pch Number referencing a plotting symbol or a character string
cex A number giving the character expansion of the plot symbols
lty Number referencing a line type
lwd Line width
10
plot(x,y,type="b",col="blue",lwd=1,lty=4,pch=5, main="My plot", xlab="x
axis", ylab="y axis")
grid(col="red")
text(8,2,"this is my example plot")
abline(h=1,v=4, col=c("darkred","green"), lty=c(1,4), lwd=c(4,6))
reg.lm=lm(x~y)
abline(reg.lm, col="red",lwd=6) #To add the regression line
11
There is wealth of plotting parameters you can set
plot(x,y)
plot(x,y, pch=16) # plot with new mark with dark circle
x1<- seq(1,5,0.1)
lines(x1,.5*x1) #lines will add (x,y) values
12
## EXAMPLES with Yield Data #########################
data2=read.csv("yielddata.csv",header=T)
plot(data2$Fert,data2$Yield)
grid()
plot(data2$Fert,data2$Yield, type="p", col="blue", lwd=1, lty=4, pch=1, main="My
plot for yield versus fertiliser ", xlab="yield", ylab="Fertiliser")
grid(col="red")
plot(data2$Fert,data2$Yield,type="p",col="green",lwd=1,lty=4,pch=9, main="My
plot", xlab="x axis", ylab="y axis")
text(250000,30000,"this is my example plot")
abline(h=20000,v=200000, col=c("darkred","green"), lty=c(1,4), lwd=c(1,2))
reg.lm=lm(x~y)
abline(reg.lm, col="red",lwd=6) #To add the regression line
13
dx<- rnorm(20,5,5) ## generate 100 random number from standard normal distribution
dy<- rchisq(20,5) ## generate 100 random number from chisq distribution with mean 5
plot(dx,dy,pch=1)
fit<-lm(dx~dy)
abline(fit,col="red",lwd=4)
text(10,4,"Fitted line")
See ? plot
See ? points
14
x <- rnorm(50) ;y <- rnorm(50)
group <- rbinom(50, size=1, prob=.5)
# Basic Scatterplot
plot(x, y)
plot(x, y, xlab="X values", ylab="Y values", main="Simple Y vs X", pch=15, col="red")
3
2
1
y
0
-1
-2
-2 -1 0 1
15
# Distinguish between two separate groups
plot(x, y, xlab="X values", ylab="Y values", main="Grouped data Y vs X",
pch=ifelse(group==1, 5, 19), col=ifelse(group==1, "red", "blue"))
plot(x, y, xlab="X", ylab="Y", main="Y vs X", type="n")
points(x[group==1], y[group==1], pch=5, col="red")
points(x[group==0], y[group==0], pch=19, col="blue")
plot(x, y, xlab="X", ylab="Y", main="Y vs X", type="n")
points(cbind(x,y)[group==1,], pch=5, col="red")
points(cbind(x,y)[group==0,], pch=19, col="blue")
16
Line Graphs
# Basic Line Graphs
plot(sort(x), sort(y), type="l", lty=2, lwd=2, col="blue")
plot(x, y, type="n")
lines(sort(x), sort(y), type="b")
lines(cbind(sort(x),sort(y)), type="l", lty=1, col="blue")
17
plot(sort(x), type="n")
lines(sort(x), type="b", pch=8, col="red")
lines(sort(y), type="l", lty=6, col="blue")
18
Histogram and Density Plot
Histograms: used to study the distribution of continuous data, use
command hist.
hist: function to plot histogram
## generate 100 random numbers from standard normal distribution
# Basic Histogram
u<- rnorm(100)
hist(u) #default histogram
19
hist(u, density=20) #with shading
20
The sequence of commands below plots two histograms in one window
par(mfrow=c(1,2)); hist(u);hist(u, density=50)
par(mfrow=c(a,b)) gives a rows with b plots on each row.
21
#with specific number of bins
par(mfrow=c(1,2)); hist(u, density=5, breaks=20); hist(u, density=20, breaks=20)
Read in the help file about hist- help(hist)
22
# Probability/proportion, instead of frequency also specifying y-axis
hist(u, density=20, breaks=-3:3, ylim=c(0,.5), prob=TRUE)
23
hist(u,freq=F,ylim = c(0,0.8))
curve(dnorm(x), col = 2, lty = 2, lwd = 2, add = TRUE)
The freq=F argument to hist ensures that the histogram is in terms of
densities rather than absolute counts
24
# overlay normal curve with x-lab and ylim # colored normal curve
# Uses the observed mean and standard deviation for plotting the normal curve
m<-mean(u) ;std<-sqrt(var(u))
hist(u, density=20, breaks=20, prob=TRUE, xlab="x-variable", col="red",
ylim=c(0, 0.7), main="normal curve over histogram")
curve(dnorm(x, mean=m, sd=std), col="darkblue", lwd=2, add=TRUE)
25
hist(u, density=10, breaks=20, col="red", prob=TRUE, xlab="x-variable", ylim=c(0,0.8),
main="Density curve over histogram")
lines(density(u),col = "blue")
# Kernel Density Plot
u<- rnorm(100)
d <- density(u) # returns the density data
plot(d)
26
Boxplots
Boxplots: also a useful tool for studying data. It shows the median,
quartiles and possible outliers.
The R command is boxplot, which we use on the same variables as the
histogram:
# Basic boxplot
boxplot(u, xlab="my variable", boxwex=.4)
boxplot(u, xlab="my variable", boxwex=.6, col="blue", border= "red, lty=2,
lwd=2)
27
## we creat data: three variables
u1<- rnorm(100) ## generate 100 random number from standard normal distribution
u2<- rchisq(100,5) ## generate 100 random number from chisq distribution with mean 5
u3<- rnorm(100,5,1) ## generate 100 random number from normal distribution with mean 5, sd 1
boxplot(u1,u2,u3, boxwex=.4)
boxplot(u1,u2,u3, boxwex=c(.2,.4,.6),col=c("red","blue","green"))
28
variablename<-c("low","medium", "high")
boxplot(u1,u2,u3,names=variablename,boxwex=c(.2,.4,.6), col=c("red","blue","green"),
ylim=c(-5, 20), xlab="variable status")
boxplot(u1,u2,u3,names=variablename, boxwex=c(.2,.4,.6),col=c("red","blue","green"),ylim=c(-
5, 20),xlab="variable status", notch = TRUE)
## try
boxplot(u, xlab="my variable", pars = list(boxwex = 0.5, staplewex = .5, outwex = 0.5),plot = F)
boxplot(u, xlab="my variable", pars = list(boxwex = 0.5, staplewex = .5, outwex = 0.5),plot = T)
?boxplot
29
Barchart (or barplot)
The R command is barplot
MPCE <- c(400, 300,600,550,425)
Suppose data in MPCE are average MPCE of some states whose names are to be
assigned against their value. Following commands are required:
names(MPCE)<-c("UP","MP","Punjab","TN","WB")
To assign names of states. Double quotation mark means that names are
characters not numeric.
barplot(MPCE, names=names(MPCE), ylab="MPCE (Rs)",col="blue")
30
barplot(MPCE, names=names(MPCE),ylab="MPCE (Rs)", col = c("blue","red","gray","orange","black"))
600
500
400
MPCE (Rs)
300
200
100
0
UP MP Punjab TN WB
31
barplot(MPCE, space=2,names=names(MPCE),xlab="States", ylab="MPCE (Rs)", col =
c("blue","red","gray","orange","black"))
?barplot
32
You can plot more than one curve on a single plot, and label them via a
legend:
range <- seq(-10,10, by = 0.001)
norm1 <- dnorm(range, mean=0, sd=1)
norm2 <- dnorm(range, mean=1, sd=2)
plot(range,norm1, type="l", lty=1, col="red", main="Two Normal Distributions",
xlab="Range", ylab="Probability Density")
points(range, norm2, type="l", lty=2,col="blue")
legend(x=-10,y=0.4,legend= c("N(0,1)", "N(1,2)"), lty=c(1,2),col=c("red","blue"))
33
34
curve()
The function curve() draws a curve corresponding to a given function
If the function is written within curve() it needs to be a function of x
If you want to use a multiple argument function, use x for the argument
you wish to plot over
# Plot a 5th order polynomial
curve(3*x^5-5*x^3+2*x, from=-1.25, to=1.25, lwd=2, col="blue")
35
# Plot the gamma density
curve(dgamma(x, shape=2, scale=1), from=0, to=7, lwd=2, col="red")
# Plot multiple curves, notice that the first curve determines the x-axis
curve(dnorm, from=-3, to=5, lwd=2, col="red")
curve(dnorm(x, mean=2), lwd=2, col="blue", add=TRUE)
# Add vertical lines at the means
lines(c(0, 0), c(0, dnorm(0)), lty=2, col="red")
lines(c(2, 2), c(0, dnorm(2, mean=2)), lty=2, col="blue")
36
Clean out the workspace
rm(list=ls())
#List objects in workspace
ls()
#File path is relative to working directory
#Get or Set Working Directory
getwd()
setwd()
# e.g. setwd("C:/Documents and Settings/ Desktop")
37
Saving Graphs
Graphs can be saved using several different formats, such as PDFs,
JPEGs, and BMPs, by using pdf(), jpeg() and bmp(), respectively
Graphs are saved to the current working directory
Save graphics by choosing File -> Save as
# Create a single pdf of figures, with one graph on each page
Graphics devices for BMP, JPEG, PNG and TIFF format bitmap files.
png(file="My Histogram.png",width=400,height=350) # Start graphics device
par(mar=c(5,4,2,2)+0.1) #margin size c(bottom, left, top, right)
m<-mean(u) ;std<-sqrt(var(u))
hist(u, density=20, breaks=20, prob=TRUE, xlab="x-variable", col="red", ylim=c(0, 0.7))
curve(dnorm(x, mean=m, sd=std), col="darkblue", lwd=2, add=TRUE)
dev.off() # Stop graphics device
#bmp(filename = "plot.bmp", )
#jpeg(filename = "plot.jpg",
#pdf("C://SavingExample.pdf", width=7, height=5)
38
# Create multiple pdfs of figures, with one pdf per figure
pdf(width=7, height=5, onefile=FALSE)
x <- rnorm(100)
hist(x, main="Histogram of X")
plot(x, main="Scatterplot of X")
dev.off() # Stop graphics device
39
Packages
Packages are collections of R functions, data, and compiled code in a
well-defined format. The directory where packages are stored is called
the library
The base distribution comes with some high priority add on packages,
for example, boot, nlme, stats, grid, foreign, MASS, spatial etc
The packages included as default in base distribution implement
standard statistical functionality, for example, linear models, classical
tests etc
Packages not included in the base distribution can be downloaded and
installed directly from R prompt
Once installed, they have to be loaded into the session to be used
Currently, the CRAN package repository has 4348 packages
40
library() # To see all installed packages
help("INSTALL") or help("install.packages") in R for information on
how to install packages from this repository
Adding Packages
Choose Install Packages from the Packages menu
Select a CRAN Mirror
Select a package (e.g. car)
Then use the library(package) function to load it for use (e.g.
library(car))
41
Load R PACKAGES
42
43
44
45
Alternative way
Load from local drive, first download from site
46
47
48
Package car (Companion to Applied Regression)
library(car)
Before starting with the use of any package it is advisable to go through its
documentation.
http://cran.r-project.org/web/packages/car/index.html
http://cran.r-project.org/web/packages/car/car.pdf
49
50
Creating Your Own Package
We may want to share our code with other people, or simply make it easier
to use ourselves. There are two popular ways of starting a new package:
Load all functions and data sets you want in the package into a clean
R session, and run package.skeleton(). The objects are sorted into
data and functions, skeleton help files are created for them using
prompt() and a DESCRIPTION file is created. The function then prints
out a list of things for you to do next
Create it manually, which is usually faster for experienced developers
51
Structure of a package
The extracted sources of an R package are simply a directory
somewhere on your hard drive. The directory has the same name as the
package and the following contents:
A file named DESCRIPTION with descriptions of the package, author,
and license conditions in a structured text format that is readable by
computers and by people
A man/ subdirectory of documentation files.
An R/ subdirectory of R code.
A data/ subdirectory of datasets.
Less commonly it contains
A src/ subdirectory of C, Fortran or C++ source
exec/ for other executables (eg Perl or Java)
52
Simple Scatterplot
? mtcars
mtcars
attach (mtcars)
plot(wt, mpg, main="Scatterplot Example", xlab="Car Weight ",
ylab="Miles Per Gallon ", pch=19)
53
# Add fit lines
# regression line (y~x)
abline(lm(mpg~wt), col="red")
# lowess line (x,y) : Normally a local linear polynomial fit is used
lines(lowess(wt,mpg), col="blue")
54
The scatterplot( ) function in the car package offers many enhanced
features, including fit lines, marginal box plots, conditioning on a factor,
and interactive point identification
# Enhanced Scatterplot of mpg vs. weight by number of Car cylinders
# Load package car
library(car)
scatterplot(mpg ~ wt |cyl, data=mtcars, xlab="Weight of Car", ylab="Miles
Per Gallon", main="Enhanced Scatter Plot", labels=row.names(mtcars))
55
56
Scatterplot Matrices
# Basic Scatterplot Matrix
pairs(~mpg+disp+drat+wt,data=mtcars, main="Simple Scatterplot Matrix")
57
The car package can condition the scatterplot matrix on a factor, and optionally
include lowess and linear best fit lines, and boxplot, densities, or histograms in
the principal diagonal, as well as rug plots in the margins of the cells.
# Scatterplot Matrices from the car Package
library(car)
scatterplotMatrix(~mpg+disp+drat+wt|cyl, data=mtcars, main="Three Cylinder Options")
58
The gclus package provides options to rearrange the variables so that
those with higher correlations are closer to the principal diagonal. It can
also color code the cells to reflect the size of the correlations.
# Scatterplot Matrices from the glus Package
library(gclus)
dta <- mtcars[c(1,3,5,6)] # get data
dta.r <- abs(cor(dta)) # get correlations
dta.col <- dmat.color(dta.r) # get colors
# reorder variables so those with highest correlation are closest to the
diagonal
dta.o <- order.single(dta.r)
cpairs(dta, dta.o, panel.colors=dta.col, gap=.5,main="Variables Ordered
and Colored by Correlation" )
59
60
High Density Scatterplots
When there are many data points and significant overlap, scatterplots
become less useful
There are several approaches that be used when this occurs
The hexbin(x,y) function in the hexbin package provides bivariate
binning into hexagonal cells
# High Density Scatterplot with Binning
Load hexbin package
library(hexbin)
x <- rnorm(1000)
y <- rnorm(1000)
bin<-hexbin(x, y, xbins=50)
plot(bin, main="Hexagonal Binning")
61
Hexagonal Binning
3
Counts
60
2
56
53
1 49
45
42
0 38
y
34
30
-1
27
23
-2 19
16
12
-3 8
5
1
-4 -2 0 2
x
bin<-hexbin(x, y, xbins=50)
plot(bin, main="Hexagonal Binning")
Another option for a scatterplot with significant point overlap is the
sunflowerplot.
See help(sunflowerplot) for details
# High Density Scatterplot with Color Transparency
62
3D Scatterplots
# 3D Scatterplot
Load package scatterplot3d
library(scatterplot3d)
attach(mtcars)
scatterplot3d(wt,disp,mpg, color="red", col.axis="blue", pch=16,
col.grid="lightblue", main="3D Scatterplot")
63
# 3D Scatterplot with Coloring and Vertical Drop Lines
library(scatterplot3d)
attach(mtcars)
scatterplot3d(wt,disp,mpg, pch=16, highlight.3d=TRUE, type="h",col.axis="blue", main="3D
Scatterplot")
64
#3D Scatterplot with Coloring and Vertical Lines and Regression Plane
library(scatterplot3d)
attach(mtcars)
s3d <-scatterplot3d(wt,disp,mpg, pch=16, highlight.3d=TRUE,
type="h", main="3D Scatterplot")
fit <- lm(mpg ~ wt+disp)
s3d$plane3d(fit)
65
Spinning 3D Scatterplots
You can also create an interactive 3D scatterplot using the plot3D(x,
y, z) function in the rgl package
It creates a spinning 3D scatterplot that can be rotated with the
mouse
The first three arguments are the x, y, and z numeric vectors
representing points
col= and size= control the color and size of the points respectively
Load package rgl
library(rgl)
plot3d(wt, disp, mpg, col="red", size=3)
66
67
You can perform a similar function with the scatter3d(x, y, z) in the Rcmdr
package.
68