0% found this document useful (0 votes)
111 views11 pages

Comp 1 HW

This document provides an introduction and overview of the basics of R statistical software. It explains how to perform basic arithmetic and assign values to variables. It also demonstrates how to create and manipulate vectors, including accessing elements, calculating properties like length and sum, and applying functions element-wise. Finally, it shows how to create simple plots by passing vectors of x and y values to the plot function. The goal is to familiarize the reader with fundamental R tools that will be used throughout the course.

Uploaded by

Tienanh Nguyen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
111 views11 pages

Comp 1 HW

This document provides an introduction and overview of the basics of R statistical software. It explains how to perform basic arithmetic and assign values to variables. It also demonstrates how to create and manipulate vectors, including accessing elements, calculating properties like length and sum, and applying functions element-wise. Finally, it shows how to create simple plots by passing vectors of x and y values to the plot function. The goal is to familiarize the reader with fundamental R tools that will be used throughout the course.

Uploaded by

Tienanh Nguyen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Computing Assignment #1 Introduction to R

Premise & Directions: This assignment is an opportunity to use the R statistical package and learn
some basic tools within the software that we will utilize throughout the class. The write-up of this
assignment is lengthly because it includes many examples and explanations to guide you. You should
see that in many ways, R is nothing more than a fancy graphing calculator with a full keyboard.
Throughout this assignment (and in some future assignments) I will include R code, R output and
comments along with the problems. To distinguish between the three, the following fonts will used:
Text in this font is general comments, description and questions
TextinthisfontwillbeRcommands
TextinthisfontwillbeoutputfromR
If you have not already done so, complete Computing Assignment #0, this consists of installing the R
software and the RStudio programs on your personal computer. You are also free to use one of the
computers on campus to complete these and future assignments.

Part 1 The Basics

As mentioned above, R can be considered a very fancy graphing calculator. It can therefore be used for
arithmetic, both simple & more complicated:

2+8
[1]10
719
[1]12
4*3
[1]12
36/8
[1]4.5
exp(1)
[1]2.718282
exp(3)
[1]20.08554
log(exp(1))
[1]1
log(3)
[1]1.098612
sin(3*pi/2)
[1]1
sin(pi)
[1]1.224647e16

Note from trigonometry that sin( )=0 yet here we get a non-zero answer; this is because computer
software approximates functions such as sin, cos, log, etc... and occasionally there is a numeric error.
Yet note the magnitude of the returned value (on the order of 10-16). For our class, these potential
numeric errors should not be too concerning, but in some real world applications can lead to erroneous
results (imagine if we needed to calculate 1trillionsin() ? ? ? ).
Part 2 Assignments and Objects

In many applications it may be convenient to save a specific value by name, for instance, I am
approximately 70.5 inches in height, we can save this value by assigning it to a variable name,
my.height for instance:

my.height<70.5

Note that the < symbols (less than and dash) are used to assign the value 70.5 into the variable name
my.height. A variable name must begin with a letter and can include numbers and periods. R is case
sensitive (so my.height and My.Height are different variables). Each variable is an object of a certain
class or type; in the case of my.height, it is simply a number, or what is called a numeric type:

class(my.height)
[1]"numeric"

R has many types available and you can create your own (this is not required for this course), for
instance you can have character strings or logical (boolean) variables:

my.name<"Thomas"
class(my.name)
[1]"character"
my.name
[1]"Thomas"
my.status<TRUE
class(my.status)
[1]"logical"
my.status
[1]TRUE

There are other object types that we will see later in this class. At any point we can get a list of the
current objects in R

ls()
[1]"my.height""my.name""my.status"

Part 3 Vectors

A vector has several meanings in the world of science (physics and linear algebra for instance). In R, a
vector can simply be thought of as a list (a list is also an object type the more advanced user may
consider) to house several elements. These elements can be of almost any object type (but must match
types). You can easily construct a vector using the c() function in R which combines elements into a
vector.

c(1,2,4,5,6)
[1]12456

Of course, we can store the vector as a variable.


my.vector<c(1,2,4,5,6)
my.vector
[1]12456

We can create vectors of other types as well

my.name.chars<c("T","h","o","m","a","s")
my.name.chars
[1]"T""h""o""m""a""s"
vector.logicals<c(TRUE,TRUE,FALSE,FALSE,TRUE,TRUE)
vector.logicals
[1]TRUETRUEFALSEFALSETRUETRUE

Once a vector is in R, we can access certain elements of the vector using bracket notation [].

my.vector[3]
[1]4

That is, the third element of the vector is the number 4

my.name.chars[3:5]
[1]"o""m""a"

Note the notation, 3:5 reports the third, fourth and fifth elements of the vector. This is some special
notation within R, 3:5 creates a vector of integers from 3:5. Here are some examples:

3:5
[1]345
1:10
[1]12345678910
49:68
[1]4950515253545556575859606162636465666768

A few functions are designed to help us manage vectors, two that we will use frequently are length()
and sum():

length(my.vector)
[1]5
sum(my.vector)
[1]18

Note that the sum of all the elements in my.vector is 18; this can easily be confirmed by hand
(1+2+4+5+6=18). Also note the length of the vector is 5 because there are 5 elements. Another
example:

length(vector.logicals)
[1]6
sum(vector.logicals)
[1]4

The length of the vector.logicals is 6 which shouldn't be a surprise given how we defined it. However
note the output of the sum() function? R treats logicals as booleans (that is, TRUE & FALSE) but also
as the integers 1 & 0. So sum(vector.logicals) sums up all the TRUE statements. This will be handy
later in the class to calculate proportions. We can also use the sum() function to confirm Gauss'
equation:

sum(1:100)
[1]5050
100*101/2
[1]5050

Another way to construct vectors of numbers in R is through the seq() function. It accepts three
parameters, a starting point, an ending point, and an interval length. Here are some examples:

seq(1,10,1)
[1]12345678910
seq(0,1,0.1)
[1]0.00.10.20.30.40.50.60.70.80.91.0
seq(1,1.25,0.25)
[1]1.000.750.500.250.000.250.500.751.001.25
seq((4/3),(1/3),(1/3))
[1]1.33333331.00000000.66666670.33333330.00000000.3333333

Another way to construct a vector is using the rep() command. It will repeat an element some number
of times specified by two parameters, respectively.

c(1,1,1,1,1,1)
[1]111111
rep(1,6)
[1]111111
rep(c(1,2,3),3)
[1]123123123

Another nice feature in R is the application of a mathematical function on an entire vector. For
instance, consider our vector, my.vector, we can do all of the following:

my.vector
[1]12456
sqrt(my.vector)
[1]1.0000001.4142142.0000002.2360682.449490
log(my.vector)
[1]0.00000000.69314721.38629441.60943791.7917595
my.vector*2+3
[1]57111315
As you can see, the vector is a valuable structure within R and we will be utilizing it throughout the
semester. It is a handy way to store data, sequences and other items.

The handling of vectors in R is quite user friendly. In most cases, we can perform an operation over
every element in the vector simply by running a function with the vector as input (see sqrt & log
examples above). Square all the elements in the vector:

my.vector^2
[1]14162536
my.vector*my.vector
[1]14162536

From linear algebra, the second line may seem confusing (in linear algebra, the multiplication of two
vectors results in a scalar or matrix but here we get another vector). By default R performs component-
wise arithmetic on the elements of the vector. Linear algebra operators are handled a little differently
but and will be addressed later. The component-wise arithmetic can be handy, for instance:

my.vector>4
[1]FALSEFALSEFALSETRUETRUE
my.vector[my.vector>4]
[1]56
sum(my.vector[my.vector>4])
[1]11

The first line checks if the elements in my.vector are greater than 4 and returns a vector of logicals of
equal size. The second command uses those TRUE/FALSE as the index to capture only the fifth and
sixth terms of my.vector. The last line sums those two terms.

The next group of code within Part 3 is a bit advanced, but we will use this functionality later in the
course. It is not necessary for this assignment; please feel free to skip it if feeling overwhelmed.

Sometimes we need to do more to each element than a simple function, for this we will use the sapply()
function which takes two inputs, the first is the vector and the second input is the function to be
performed on each element. A rudimentary example we've seen before is:

sapply(my.vector,sqrt)
[1]1.0000001.4142142.0000002.2360682.449490

In most cases it might be more interesting to define our own function. Perhaps we wish to take the
square root of 3+the square of each element. First we need to define our function:

my.fun<function(x){
sqrt(x^2+3)
}
sapply(my.vector,my.fun)
[1]2.0000002.6457514.3588995.2915036.244998
Of course this example could be done with the simpler operation:

sqrt(my.vector^2+3)
[1]2.0000002.6457514.3588995.2915036.244998

but later we may wish to perform more complicated evaluations and build our own functions.

Part 4 Plots

Those familiar with a TI-84 calculator may


realize many of the steps in this assignment are
essentially graphing calculator type methods. The
next step is to build graphs! The basic tool in R
for building graphics is the plot() command. By
default it looks for two vectors of equal length,
the first represents the x-axis and the second
represents the y-axis. Here is a basic example
resulting in the graph to the right:

x.vals<seq(2,12,2)
y.vals<seq(110,160,10)
plot(x.vals,y.vals)

A more interesting example may involve data


(note: the below example is fake data). Suppose a
statistician went and sampled 10 adults (6 males and 4 females) measuring their heights (in) and
weights (lbs). These two sets of measures will be recorded in two vectors where the elements
correspond to an individual:

heights<c(70,75,59,66,63,65,68,72,62,70)
weights<c(170,250,114,165,135,140,190,190,122,185)
plot(heights,weights)
In practical applications we likely want to jazz up our graphics (perhaps to include in a report?). First
we create a third vector corresponding to the birth gender of the individuals. Here I use a 0 to indicate a
Male and a 1 to indicate a Female. I will then use this vector of genders to specify the color of the
points in the plot. A color of 0 is nothing and a color of 1 is black, so I add the number 5 to each of the
genders giving a sequence of 5's & 6's which corresponds to colors light blue & pink/purple,
respectively. I also change the point type using the pch parameter, I set it to 16 which is a closed circle
compared to the open circle by default. You will also see that I label the axis in a more appropriate
fashion. We also use the legend() command to add a legend to our plot. The result is a graphic that is
easy-to-read and provides quite a bit of information.

genders<c(0,0,1,0,1,1,0,0,1,0)
genders+2
[1]2232332232
plot(heights,weights,col=genders+5,pch=16,
xlab="Height(in)",ylab="Weight(lbs)",
main="HeightsvsWeights")
legend("topleft",leg=c("Male","Female"),pch=16,col=c(0,1)+5)
R has other options for plotting. For instance, suppose we wished to plot a function, perhaps the sine
function over the interval -4 to 4. First we want to construct the appropriate interval using a vector
and then calculate the sine function over the entire interval.

x.vals<seq(4,4,0.01)
y.vals<sin(x.vals*pi)
plot(x.vals,y.vals,type='l',main="SineFunction")

Note in the plot() function we specify the type as l (lower case L). This says to draw a line by
essentially connecting the dots. In our example since the ordered pairs are in sequential order by the x-
axis it draws a smooth line. Also note the interval in the seq() command, a smaller value would provide
a smoother graph.
Later in the course we will introduce more advanced graphical techniques using the ggplot2
package.

Part 5 Matrices

You may note in the previous example we used three different vectors to store all of our information on
one sample. This may seem cumbersome. R has a tool to help us, a matrix, which you can think of as a
vector of vectors. Matrices can be constructed in a number of ways, the easiest (in my mind) is with the
cbind() and rbind() commands (bind by columns and rows, respectively).

data1<cbind(heights,weights,genders)
data2<rbind(heights,weights,genders)
data1
heightsweightsgenders
[1,]701700
[2,]752500
[3,]591141
[4,]661650
[5,]631351
[6,]651401
[7,]681900
[8,]721900
[9,]621221
[10,]701850
data2
[,1][,2][,3][,4][,5][,6][,7][,8][,9][,10]
heights70755966636568726270
weights170250114165135140190190122185
genders0010110010

We saw above the sapply() command for a vector. In a similar vein, the apply() function can be used for
a matrix. It expects three parameters: the matrix, a flag for row, column or both, and the function to be
performed. For instance, suppose we wished to calculate the sum of all the heights and weights.

apply(data1,2,sum)
heightsweightsgenders
67016614

The first parameter says to use the matrix data1, the second says to do things by columns (2 for
column, 1 for row) and the third parameter says to run the sum function. Also note that the genders
have been summed because we used a 0/1 indicator variable to distinguish genders. For the genders
variable, the sum is contextually meaningless, yet the computer just treats it as a number.

Note: R can be considered a little like a poor man's Matlab. That is, it treats most objects as a
vector/matrix by default and can be used for linear algebra (matrix multiplication, dot products,
Kronecker products, etc...) and other numeric methods. I use the description poor man's for two
reasons: (1) R is free!!!, (2) Sometimes you get what you pay for! Matlab is more efficient (and more
accurate in some cases) when it comes to linear algebra and other applied mathematics.

Work through the above examples by Wednesday August 30. You will be able to ask questions about
the code on that day.
Assignment #1 (30pts):

Do the following problems by Friday, September 1, 2017. Make sure all solutions include R code
completing the problem and abide by the computing homework guidelines.

1. Run the command help(round) to read the helpfile on the round() function. Then calculate
the following numerical results to three decimal places:
(a) (78)+ 5356+ 62
(b) ln (3)+ 2 sin( )e 3
(c) (92)4 10+ln (6)exp (1)

2. Sequences and operations:


(a) Create a vector named countby5 that is a sequence of 5 to 100 in steps of 5.
(b) Compute the square root of countby5
(c) In R, type help(log), after reading the documentation, compute the log base 5 of the
vector countby5.

3. Vectors, matrices, sequences and logical operators


(a) Assign the names x and y to the values 5 and 7, respectively. Find xy and assign the result to
z. What is the value stored in z?
(b) Create the vectors u = (1, 2, 5, 4) and v = (2, 2, 1, 1) and use them to create a new vector by
performing w = u + v
(c) Provide code to find which component of u is equal to 5
(d) Provide code to give the components of v greater than or equal to 2.
(e) Find the product of uv
(f) Explain what R does when two vectors of unequal length are multiplied together.
Specifically, what is uc(u,v)
(g) Provide code to define a sequence from 1 to 10 called G and subsequently select the first
three components of G.
(h) Define the matrix X whose rows are the u and v vectors from above.
(i) Define the matrix Y whose columns are the u and v vectors from above.
(j) Provide code that sums the rows of the vector Y and report those values.

4. Some plotting
(a) Provide a plot of the cosine function over the domain [-10, 10].
(b) Consider the vectors u and v from question 3 as ordered pairs. Plot the points and colorize
each point with a unique color.

5. More on vectors...
(a) Create a vector that is a sequence of the values 3 to 3726 in steps of 3 (do not print out the
vector, but save it!)
(b) Use the modulus function (%%) and determine the number of elements in the vector that are
divisible by 5.
(c) What is the largest value from the elements in part (b); that is, of the elements in 3 to 3726
by 3 that are divisible by 5, which is the largest (find using code!)
(d) Sum up the elements from part (b); that is, find the sum of the elements in the sequence
from 3 to 3726 by step size 3 that are divisible by 5.

You might also like