INSTITUTE OF SCIENCE, NAGPUR.
(AN AUTONOMOUS INSTITUTE OF GOVERNMENT OF MAHARASHTRA)
CERTIFICATE COURSE
IN
“STATISTICAL PACKAGE R”
• CHAPTER 1
1. INTRODUCTION
2. R AS A STATISTICAL SOFTWARE AND LANGUAGE
3. R AS A CALCULATOR
4. R PRELIMINARIES
5. METHODS OF DATA INPUT
In today’s world, understanding statistics is essential in agriculture, biology, business,
chemistry, commutations, economics, education, electronics, medicine, physics, political
science, psychology, sociology and numerous other fields.
Physicians and other health related professionals apply it to evaluate results of studies
investigating new drugs and therapies for treating diseases.
Managers use it to determine factors that help predict and to measure employee performance.
In process of solving real life problems using statistics, three steps are necessary:
I. Selection of representative samples from the population and study of sample data to
identify patterns in the data.
II. Selection of the proper statistical model
III. Making decisions or drawing inferences from the selected model to solve the problem
under consideration.
The tremendous increase in computer power coupled with continued improvement and accessibility
of statistical software has had major impact on analysis of statistical data. We think that the real fun
of statistics is more in conceptualization than in calculation and therefore we are fortunate to have
computers to do drudge work.
To take advantage of this powerful tool, one needs good statistical software. We find public-domain
software and language R to be the best choice.
R Installation: R software is obtained from Comprehensive R Achieve Network (CRAN), which
may be reached from the R project web site at www.r-project.org. The files needed to install R are
distributed from this site.
Starting and Stopping R: When R is installed properly, you will see R icon on your desktop. Click
on R icon to start R. To end the session type q().
2. R AS A STATISTICAL SOFTWARE AND LANGUAGE:
• Every expert describes R differently. This happens because R appeals differently to experts
in different fields. A statistician finds R easy to use for data analysis because R has built-in
commands for most of the common statistical procedures beginning with descriptive
statistics through correlation, regression, ANOVA, MANOVA, clustering to factor analysis
including quality graphical procedures like histograms, box-plot, scatterplot and so on.
• The most important feature of R is that it is not a proprietary product, it is not copyrighted
and therefore is open to everybody to acquire, modify, upgrade or improve in any way one
can think of.
How good is R in terms of its quality and precision?
• In this regard it is safe to state that R is one of the most powerful high-level language to
carry out statistical analysis.
• As a language it has built-in functions as well as contributed libraries that can directly be
used for most of the statistical analysis.
• Users of statistics can use the built-in help of R to learn up-to-date statistical procedures
and state-of-the-art statistical practices.
• The world being heterogeneous in terms of operating systems, R has received support for
the additional reason that it is available for several operating systems such as windows,
Unix, Linux, Mac etc.
• R programs can be ported from one operating system to another.
R as statistical Software: Use of the statistical software is must for handling large data sets and complex
procedures of analysis. Several statistical packages are available such as (i) R (ii) SPSS (iii)SAS (iv)S-
PLUS (v)MINITAB to name the few.
We strongly advocate R in view of following reasons.
• R has very good computing performance.
• R is a free software.
• R has excellent built-in help system.
• The graphical environment of R is flexible and powerful giving many possibilities for graphical analyses.
• R is computer programming language, hence for those who are familiar with programming language it will
be very easy and for new users the next leap to programming is not hard.
• R language is easy to extend with user written functions. In fact, R can be modified by the users and its
development is open to contributors. Of course, original sources can only be modified by R core team.
• R provides scripting and interfacing facilities also.
Data types and arithmetic operators:
The usual data types available in R are known as modes. The modes are logical (Boolean
True/False), numeric(integers and reals), complex( real+imaginary numbers). The data
analysis in R proceeds as an interactive dialogue with the interpreter. As soon as we type
command at the prompt(>), and press the enter key, the interpreter responds by executing
the command.
R language includes the usual arithmetic operations with usual hierarchy. Following table
Operator Function
shows the common operators.
+ Addition
- Subtraction
* Multiplication
/ Division
^ Exponentiation
Here are some simple examples of arithmetic in R:
2+7
[1]9
After entering 2+7 at the prompt and pressing the enter key we get the output.
[1] 9
The symbol [1] in the output indicates a vector. This notation will make sense once vectors are
introduced.
x<-2 +7 # The operator <- (“less than “ sign followed by “minus” sign) is known as
assignment operator. It assigns the value of the expression on right to object on left.
The object is not displayed on the screen but stored in the active memory of R. The object will be
displayed by typing the name of the object.
x; # Output is:
[1] 9.
4^2-3*2
[1] 10
2^-3
[1] 0.125
It is always better to specify the order of evaluation of the expression by using parenthesis. For
example,
1-2*3
[1] -5
(1-2) *3
[1] -3
Note: Spaces are not required to separate the elements of arithmetic expressions.
However, judicious use of spaces can help to clarify the meaning of the expressions.
Example 1) Find the distance between the two points; (2,4,6) and (4.2,7.1,8.6).
Solution: The distance between the two points, say, (x1,x2,x3) and (y1,y2,y3) is defined as,
𝑥1 − 𝑦1 2 + 𝑥2 − 𝑦2 2 + 𝑥3 − 𝑦3 2
R command to evaluate the expression is:
sqrt( (2-4.2)^2 + (4-7.1)^2 + (6-8.6)^2) ; # Output is:
[1] 4.605432
Assignment operator: As mentioned earlier, R uses the assignment operator <- to give a data
object (or any other object) its value. The operator -> may also be used. However, with -> operator
the assignment is from left to right. For example,
x<-2;
Above command assigns the value 2 to object x.
x^2->y;
This command assigns the value x2 to object y.
Note: If a command is not complete at the end of a line you will see "+" and you should complete
the command. Commands are separated by a semi colon ";” or a new line. Comments will be lines
starting with a hash mark”#” . Text to right of # is ignored by interpreter.
Comparison operators: Following table shows the available comparison operators.
Operator Comparison
> Is greater than
< Is less than
>= Is greater than or equal to
<= Is less than or equal to
== Is equal to
!= Is not equal to
! Logical not
& Logical and
| Logical or
Vector base of R: As any programming language, R contains data structures. Vectors are the basic data
structures in R. A vector is a series of elements that are of the same type. Such a series of observations is
stored in R. This means that it keeps track of the order of the elements
This is good thing for several reasons (i) It is possible to make changes to the data, item by item instead of
having to enter the data set again.
ii) Vectors are mathematical objects. So the standard arithmetic functions and operators apply to vectors on an
element-wise basis.
For example, c(1,2,3,4)/2
[1] 0.5 1.0 1.5 2.0
c( 1, 2, 3, 4)/c(4, 3, 2, 1)
[1] 0.250000 0.6666667 1.5000000 4.000000
Functions:
Many mathematical and statistical functions are available in R. A function has a name,
which is typed, and is then followed by a pair of parentheses. Arguments are added inside
this pair of parentheses as needed. For example, consider assign function.
The R command:
assign(“ x” ,3) #Assigns the value 3 to object x.
Here assign is the name of the function.
The first argument (“x”) is the name of the variable to which the assignment is to be
made. The second argument is the value to be assigned. It may be noted that, the usual
assignment operator <- can be thought of as a short-cut to this function.
Acceptable object names:
We are free to make variable names out of letters, numbers and the dot or underline
characters. A name should start with letter and we cannot use any other characters or
mathematical operators. Needless to say "case is important." R allows more than 8
characters for variable names, in fact the longest variable name can have 32 characters.
5. METHODS OF DATA INPUT
Following are the commonly used methods of data input:
(1) c function: The most useful R command for quickly entering small data sets is the c("combine”) function.
This function combines or concatenates terms together. As an example, consider y < - c(1, 2, 3, 9, 15, 17) ;
Thus, c function constructs a vector. Note that even though you have to use the c function to construct a
vector, once you have assigned the vector to, say, y then you may reassign its value to other data objects. For
example, z <-y or z <-y^ 2 is also permissible. The c function can also be used to construct a vector of
character strings, for example:
Names <-c(“Bob”,”Jack”,”Jill”) ;
(2) Sequence operator and “seq” function: These two return vectors as results. For example, the sequence
operator (:) generates consecutive numbers, while the seq function does the same thing, but more flexibly.
Examples are:
1:4 -1:2
[1]1 2 3 4 [1]-1 0 1 2
The colon (sequence) operator has high priority within an expression, for example: 2*1:15 is a
vector (2,4,6,...,30) . As an exercise, put n = 4 and compare yourself the sequences 1:n - 1 and 1 :
(n - 1) .
seq(2, 8, by = 2) ; # Specifies interval and increment.
[1]2 4 6 8
seq (0, 1, length = 11) # Specifies interval and the number of elements.
[1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Note: Parameters to seq() and many other R functions can also be given in named form, in which
case the order in which they appear is irrelevant. The first two parameters of sequence function are
named as from and to. Hence the sequences: seq(from= 1, to=30) and seq (length=15,to=30,
from=0) and seq(to=30,length=15,from=0)are equivalent.
Warning: Be careful while applying simple arithmetic functions and operations to sectors. If
operands are of different lengths, then the shorter of the two is extended by repetition (as in c(1, 2, 3,
4) / 2 above). If the length of the longer operand is not a multiple of the shorter, then a warning
message is printed, but the interpreter proceeds with the operation.
For example:
c(1, 2, 3, 4) + c(4, 3) ; #c(4,3) is repeated twice (that is, c( 4,3,4,3) is used).
[ 1] 5 5 7 7
c (1,2,3,4)+c(4,3,2); # c(4,3,2) is considered as c(4, 3, 2, 4)
[1] 5 5 5 8
Warning message: Longer object length is not multiple of shorter object length in
c(1,2,3,4)+c(4,3,2).
Modification of the objects: Consider following simple example
x < - c (10,12, 17) ;
x ;# Typing the name (x) of the object is equivalent to giving command print (x).
[1] 10 12 17
x < - c (9, 12, 17) ;print(x);
[1] 9 12 17
If the object already exists, its previous value is erased (modification affects only the objects in the
active memory, not data on the disk).