INTRODUCTION TO R
Shanti.S.Chauhan,Ph.D
Business Studies
SHUATS
AGENDA
• History and evolution of R
• Principle and software paradigm
• Description of R interface
• Advantages of R
• Drawbacks of R
• So why use R?
• References for learning R
HISTORY AND EVOLUTION OF R
Origin in the Bell Labs in the 1970’s
HISTORY AND EVOLUTION OF R
R has developed from the S language
S Version 1
S Version 2
S Version 3
S Version 4
Developed 30 years ago for research
applied to the high-tech industry
HISTORY AND EVOLUTION OF R
The regular development of R
1990’s: R developed concurrently
with S
1993: R made public
Acceleration of R development
R-Help and R-Devl mailing-lists
Creation of the R Core Group
Source: R Journal Vol 1/2
HISTORY AND EVOLUTION OF R
Growing number of packages
2001: ~100 packages
2009: Over 2000 packages
2000: R version 1.0.1
Today: R version 2.14
Source: R Journal Vol 1/2
HISTORY AND EVOLUTION OF R
Explosion of R popularity in the last decade
Object-oriented, growing user base, scripting features
Free and open-source
Irrational reasons: R seen as « cool »
HISTORY AND EVOLUTION OF R
Comparison of Mailing Lists
Evolution of the traffic on software main mailing-lists. Source: R.A. Muenchen, r4stats.com
HISTORY AND EVOLUTION OF R
Popularity amongst programming languages
KD Nuggets 2012 survey
HISTORY AND EVOLUTION OF R
Number of Blogs
Software Number of Blogs
R 365
SAS 40
Stata 8
Others 0-3
Data as on Mar 2012
AGENDA
• History and evolution of R
• Principle and software paradigm
• Description of R interface
• Advantages of R
• Drawbacks of R
• So why using R?
• References for learning R
PRINCIPLE AND SOFTWARE PARADIGM
R is not really a (statistical) software
R is rather a programming language
Limited user-friendly interfaces for data analysis
Is object oriented and almost non declarative
Similar to programming languages like Fortran, C, Java, Python
PRINCIPLE AND SOFTWARE PARADIGM
R has limited Graphical User Interface (GUI) options
Recent endeavours to enhance R user-friendliness
Several GUIs in development
R-commander
RKWard
Rattle
PRINCIPLE AND SOFTWARE PARADIGM
R Commander (RCmdr)
PRINCIPLE AND SOFTWARE PARADIGM
RKWard
PRINCIPLE AND SOFTWARE PARADIGM
Rattle
PRINCIPLE AND SOFTWARE PARADIGM
Inherent limitations of pervasive Excel-like spreadsheets
VS.
PRINCIPLE AND SOFTWARE PARADIGM
Sophisticated but costly SAS
VS.
Screenshot of SAS enteprise Miner
7.1. Source: sas.com
AGENDA
• History and evolution of R
• Principle and software paradigm
• Description of R interface
• Advantages of R
• Drawbacks of R
• So why using R?
• References for learning R
DESCRIPTION OF R INTERFACE
R console
RGui: R basic
interface
R desktop
shortcut R command
line (space to
write
instructions)
DESCRIPTION OF R INTERFACE
Using the command line in R console
First false sentence
followed by R’s
error message
Second correct
sentence
Declaration and
printing of the
sentence as a R
object
Simple math
computations
Basic information
about the R object
containing the
sentence
DESCRIPTION OF R INTERFACE
RGui menu: File tab
File tab: Usual basic
and general
operations
DESCRIPTION OF R INTERFACE
RGui menu: Edit tab
Data editor:
entering the
Edit tab: basic object’s name
and general
editing
Results of the
data editor
DESCRIPTION OF R INTERFACE
RGui menu: View tab
View tab: viewing
Toolbar and/or
Status bar
DESCRIPTION OF R INTERFACE
RGui menu: Misc tab
Misc tab:
diverse
operations
DESCRIPTION OF R INTERFACE
RGui menu: Packages tabs
Packages tab:
adding functions
to R foundation
DESCRIPTION OF R INTERFACE
RGui menu: Windows tab
Windows tab:
usual options
to arrange the
tiles
DESCRIPTION OF R INTERFACE
RGui menu: Help tab
Help tab: very
important links
to help
Arithmetic Operators in R
Operator Description
+ Addition
- Subtraction
* Multiplication
/ Division
^ Exponent
%% Modulus(Remainder for
division)
%/% Integer Division
Relational Operators
Operator Description
< Less than
> Greater Than
<= Less or equal
>= Greater than or equal
== Equal to
!= Not equal
AGENDA
• History and evolution of R
• Principle and software paradigm
• Description of R interface
• Advantages of R
• Drawbacks of R
• So why using R?
• References for learning R
ADVANTAGES OF R
R “philosophy”
Open source code
You can access the code of the software
In-depth understanding of what R does
Modify the code
Example “mgcv”
package webpage
Adress of the
« mgcv » package
Link with Package
sources (.tar.gz
file)
Screenshot of the CRAN webpage of the « mgcv » package. Source: CRAN
ADVANTAGES OF R
R access to source code
Example of source code of the “mgcv” package
Unzipping List of directories List of functions (i.e
mgcv_1.7-13.tar.gz in the « mgcv » open code) in the « src »
file (with 7zip) package (i.e code sources)
directory the « mgcv »
1 2 3 package
Screenshot of unzipping the « mgcv » package and browsing through the package’s files.
ADVANTAGES OF R
R is free
Software Academics Demo Commercial Commercial
(basic) (full)
R Free Free Free Free
SAS Free to $100s Not available $1 000s $10 000s
Statistica $100s 30 days limit ~$1 000 $10 000
Excel Free to $10s Limited ~$100 $100s
(Microsoft)
SPSS (IBM) $100s 14 days limit ~$2 000 $1 000s
ADVANTAGES OF R
Interface with other languages and scripting capabilities
Interfaces with virtually any other programming language
Fortran, C, C++, Python…
Tailor or rewrite your old codes in R
R as a scripting language
R scripts can launch or be launched by other languages
« mgcv.c » file
in the
« mgcv »
package
coded in
typical C
programming
language
Screenshot of the file « mgcv.c » of the « mgcv » package open in WordPad
ADVANTAGES OF R
R visualization capabilities
ADVANTAGES OF R
R visualization capabilities
ADVANTAGES OF R
R visualization capabilities
ADVANTAGES OF R
R role in academia
R ~ tool used by the finest researchers
Top-notch analytics capabilities
Screenshot of a user’s Facebook map . Source: Paul Butler/Facebook, DG Rossiter, spatialanalysis.co.uk
ADVANTAGES OF R
To summarize
Free open source philosophy
R websites with many examples
Free books
Free online open courses
Twitter accounts
Online help and discussion
Mailing-lists
Very active and diverse forums
Communities of developers and helpers
AGENDA
• History and evolution of R
• Principle and software paradigm
• Description of R interface
• Advantages of R
• Drawbacks of R
• So why using R?
• References for learning R
DRAWBACKS OF R
Average memory performance
Poor management of large datasets
Avoid imbricated loops
Prefer R advanced language for data structure
Complicated structure of packages in R
Dozen of packages
To be loaded every time in memory
R packages to better manage memory
Rhadoop (inspiration from Google)
Ff
bigmemory
DRAWBACKS OF R
Average computing performance
No default parallel execution
R packages to use several cores
Top skills needed for high performance computing
A high-level programming language
Abstract and modern (Python…)
More productive coding
But further from « machine language »…
… meaning 100 times slower than C
DRAWBACKS OF R
Difficult data visualization and management
Difficult to inspect data sets
Screenshot of the R data editor and « Viewtable » tab in SAS 9.3
DRAWBACKS OF R
Difficult architecture management
Problems for large organizations
R made of several thousands independent packages
No deployment plan for complex organizations
No installation support
Lack of code accountability
Thousands of individual independent R developers
Nobody responsible for the quality of the code
Potentially high hidden costs with R
Total cost may favour commercial solutions for complex computations made in large
corporations
DRAWBACKS OF R
Relatively difficult to learn
Steep learning curve
R code far from undergrad computer science courses
Very complex data structures (useful if mastered)
Is R’s syntax not logical?
Still, not more difficult to learn than SAS
Both SAS and R more abstract than basic programming languages (Fortran, C…)
Difficult to learn = more rewarding professionally!!
AGENDA
• History and evolution of R
• Principle and software paradigm
• Description of R interface
• Advantages of R
• Drawbacks of R
• So why use R?
• References for learning R
SO WHY LEARN R?
More positive than negative points
No language is perfect!!
Contradictory objectives to meet
Strengths and weaknesses of each language
Effect of legacy and the culture of the organization
Use existing solutions (system architecture, BA tools…)
Habits in business analytics
Different needs imply different tools
Large corporations + defined procedures SAS-like
Less financial resources + quick proof of concept R
SO WHY LEARN R?
Very appealing solution
Overall Corporate Consultants Academics NGO/Gov't
R
SAS
IBMSPSS
STATISTICA
Owncode
Popularity of business analytics software (green = very popular, red = unpopular). Source: Rexer Analytics
AGENDA
• History and evolution of R
• Principle and software paradigm
• Description of R interface
• Advantages of R
• Drawbacks of R
• So why using R?
• References for learning R
REFERENCES FOR LEARNING R
Books
Many books available: choose the one that fits you!
Style, pedagogy, theory vs practice
Browse several books at local library or store
Springer’s UseR! Series (http://www.springer.com/series/6991)
Recent, concise, good quality, affordable, diverse
Pure rookies: « A beginners’ guide to R », « R by example»
One step forward: « Business analytics for managers »
Intensive Excel users: « R through Excel»
O’Reilly R series (for programmers)
« R cookbook », « R in a nuttshell »
REFERENCES FOR LEARNING R
Websites
R official websites
The R project for statistical computing (www.r-project.org )
Mailing lists (« R-help », Special Interest Groups) and R journal
Official (austere) manuals (« An introduction to R »)
Other websites
UCLA online R resources http://www.ats.ucla.edu/stat/r/)
R blogs aggregator (www.r-bloggers.com)
Social networks: LinkedIn groups (The R project for statistical computing), Twitter accounts
(@RevolutionR, @inside_R), jobboards (Analytical Bridge…)
REFERENCES FOR LEARNING R
Conferences
Growing number of conferences about R
Official International R UseR! conference
Annual during a few days in new venue (Google it!)
Lots of materials about many topics
Other conferences or venues
Conferences about business analytics (data mining, specialized topics…) with sessions
involving R
Find (or even start!) a R user group close to your location (R Wiki geographical list, map of
groups on « meetup.com »)
Events and news from R-bloggers blog