0% found this document useful (0 votes)

16 views9 pages

Unit 1 Factor

This document provides an overview of factors, lists, and data frames in R, explaining how to create and manipulate these data structures. It covers the creation of factors, lists, and data frames, along with methods for accessing and modifying their components. Additionally, it discusses how to add columns, combine data frames, and subset data based on logical conditions.

Uploaded by

Adarsh Sharma 12-A

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views9 pages

Unit 1 Factor

Uploaded by

Adarsh Sharma 12-A

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Unit 1: Factor, List and Data Frames

Factors
In this section, you’ll look at some simple functions related to creating, handling, and
inspecting factors. Factors are R’s most natural way of representing data points that fit in only
one of a finite number of distinct categories, rather than belonging to a continuum. Categorical
variables in R are called “factors”. Factors have as many levels as there are unique categories.

# create a vector called 'gender'

gender <- c("f", "f", "f", "m", "m", "m", "m")
# transform 'gender' into a factor object
gender <- factor(gender)
# examine the structure of 'gender'
str(gender)
## Factor w/ 2 levels "f","m": 1 1 1 2 2 2 2

Lists of Objects

The list is an incredibly useful data structure. It can be used to group together any mix of R
structures and objects. A single list could contain a numeric matrix, a logical array, a single
character string, and a factor object. You can even have a list as a component of another list.
In this section, you’ll see how to create, modify, and access components of these flexible
structures. Creating a list is much like creating a vector. You supply the elements that you
want to include to the list function, separated by commas.

R> foo <- list(matrix(data=1:4,nrow=2,ncol=2),c(T,F,T,T),"hello")

R> foo

[[1]]
[,1] [,2]
[1,] 1 3
[2,] 2 4

[[2]]
TRUE FALSE TRUE TRUE

[[3]]
[1] "hello"

In the list foo, you’ve stored a 2*2 numeric matrix, a logical vector, and a character string.
These are printed in the order they were supplied to list. Just as with vectors, you can use the
length function to check the number of components in a list.

R> length(x=foo)
[1] 3
You can retrieve components from a list using indexes, which are entered in double square
brackets.

R> foo[[1]]
[,1] [,2]
[1,] 1 3
[2,] 2 4
R> foo[[3]]
[1] "hello"

This action is known as a member reference. When you’ve retrieved a component this way,
you can treat it just like a stand-alone object in the workspace.

To overwrite a member of foo, you use the assignment operator.

R> foo[[3]]
[1] "hello"
R> foo[[3]] <- paste(foo[[3]],"you!")
R> foo
[[1]]
[,1] [,2]
[1,] 1 3
[2,] 2 4

[[2]]
TRUE FALSE TRUE TRUE

[[3]]
[1] "hello you!"

Naming
You can name list components to make the elements more recognizable and easy to work
with. Just like the information stored about factor levels, a name is an R attribute.

Let’s start by adding names to the list foo from earlier.

R> names(foo) <- c("mymatrix","mylogicals","mystring")

R> foo
$mymatrix
[,1] [,2]
[1,] 1 3
[2,] 2 4

$mylogicals
TRUE FALSE TRUE TRUE
$mystring
[1] "hello you!"

This has changed how the object is printed to the console. Where earlier it printed [[1]], [[2]],
and [[3]] before each component, now it prints the names you specified: $mymatrix,
$mylogicals, and $mystring. You can now perform member referencing using these names
and the dollar operator, rather than the double square brackets.

R> foo$mymatrix
[,1] [,2]
[1,] 1 3
[2,] 2 4

This is the same as calling foo[[1]]. In fact, even when an object is named, you can still use
the numeric index to obtain a member.

R> foo[[1]]
[,1] [,2]
[1,] 1 3
[2,] 2 4

Data Frames
A data frame is R’s most natural way of presenting a data set with a collection of recorded
observations for one or more variables. Like lists, data frames have no restriction on the data
types of the variables; you can store numeric data, factor data, and so on. The R data frame
can be thought of as a list with some extra rules attached. The most important distinction is
that in a data frame (unlike a list), the members must all be vectors of equal length.
The data frame is one of the most important and frequently used tools in R for statistical data
analysis. To create a data frame from scratch, use the data.frame function. You supply your
data, grouped by variable, as vectors of the same length—the same way you would construct
a named list. Consider the following example data set:
R> mydata <- data.frame (person=c("Peter","Lois","Meg","Chris","Stewie"),
age=c(42,40,17,14,1), gender=factor(c("M","F","F","M","M")))
R> mydata
person age gender
1Peter 42 M
2Lois 40 F
3Meg 17 F
4Chris 14 M
5Stewie 1 M

Here, you’ve constructed a data frame with the first name, age in years, and gender of five
individuals. The returned object should make it clear why vectors passed to data.frame must
be of equal length: vectors of differing lengths wouldn’t make sense in this context. If you
pass vectors of unequal length to data.frame, then R will attempt to recycle any shorter vectors
to match the longest, throwing your data off and potentially allocating observations to the
wrong variable. Notice that data frames are printed to the console in rows and columns—they
look more like a matrix than a named list. This natural spreadsheet style makes it easy to read
and manipulate data sets. Each row in a data frame is called a record, and each column is a
variable.
You can extract portions of the data by specifying row and column index positions (much as
with a matrix). Here’s an example:

R> mydata[2,2]
[1] 40

This gives you the element at row 2, column 2—the age of Lois. Now extract the third, fourth,
and fifth elements of the third column:

R> mydata[3:5,3]
FMM
Levels: F M

This returns a factor vector with the gender of Meg, Chris, and Stewie. The following extracts
the entire third and first columns (in that order):

R> mydata[,c(3,1)]
gender person
M Peter
F Lois
F Meg
M Chris
M Stewie

This results in another data frame giving the gender and then the name of each person.
You can also use the names of the vectors that were passed to data.frame to access variables
even if you don’t know their column index positions, which can be useful for large data sets.
You use the same dollar operator you used for member-referencing named lists.

R> mydata$age
[1] 42 40 17 14 1

You can subset this returned vector, too:

R> mydata$age[2]
[1] 40

This returns the same thing as the earlier call of mydata[2,2].

You can report the size of a data frame—the number of records and variables—just as you’ve
seen for the dimensions of a matrix.
R> nrow(mydata)
[1] 5

R> ncol(mydata)
[1] 3

R> dim(mydata)
[1]5 3

The nrow function retrieves the number of rows (records), ncol retrieves the number of
columns (variables), and dim retrieves both.
R’s default behavior for character vectors passed to data.frame is to convert each variable into
a factor object. Observe the following:

R> mydata$person
[1] Peter Lois Meg Chris Stewie
Levels: Chris Lois Meg Peter Stewie

Notice that this variable has levels, which shows it’s being treated as a factor. But this isn’t
what you intended when you defined mydata earlier— you explicitly defined gender to be a
factor but left person as a vector of character strings. To prevent this automatic conversion of
character strings to factors when using data.frame, set the optional argument stringsAsFactors
to FALSE (otherwise, it defaults to TRUE). Reconstructing mydata with this in place looks
like this:

R> mydata <- data.frame(person=c("Peter","Lois","Meg","Chris","Stewie"),

age=c(42,40,17,14,1),gender=factor(c("M","F","F","M","M")),
stringsAsFactors=FALSE)

R> mydata
person age gender
1Peter 42 M
2Lois 40 F
3Meg 17 F
4Chris 14 M
5Stewie 1 M

R> mydata$person
[1] "Peter" "Lois" "Meg" "Chris" "Stewie"

You now have person in the desired, nonfactor form.

Adding Data Columns and Combining Data Frames
Say you want to add data to an existing data frame. This could be a set of observations for a
new variable (adding to the number of columns), or it could be more records (adding to the
number of rows). Once again, you can use some of the functions you’ve already seen applied
to matrices.
Recall the rbind and cbind functions, which let you append rows and columns, respectively.
These same functions can be used to extend data frames intuitively. For example, suppose
you had another record to include in mydata: the age and gender of another individual, Brian.
The first step is to create a new data frame that contains Brian’s information.

R>newrecord<-data.frame(person="Brian",age=7,
gender=factor("M",levels=levels(mydata$gender)))
R> newrecord
person age gender
1 Brian 7 M

To avoid any confusion, it’s important to make sure the variable names and the data types
match the data frame you’re planning to add this to. Note that for a factor, you can extract the
levels of the existing factor variable using levels.
Now, you can simply call the following:

R
> mydata <- rbind(mydata,newrecord)
R> mydata
person age gender
1 Peter 42 M
2 Lois 40 F
3 Meg 17 F
4 Chris 14 M
5 Stewie 1 M
6 Brian 7 M

Using rbind, you combined mydata with the new record and overwrote mydata with the result.
Adding a variable to a data frame is also quite straightforward. Let’s say you’re now given
data on the classification of how funny these six individuals are, defined as a “degree of
funniness.” The degree of funniness can take three possible values: Low, Med (medium), and
High. Suppose Peter, Lois, and Stewie have a high degree of funniness, Chris and Brian have
a medium degree of funniness, and Meg has a low degree of funniness. In R, you’d have a
factor vector like this:

R> funny <- c("High","High","Low","Med","High","Med")

R> funny <- factor(x=funny, levels=c("Low","Med","High"))
R> funny
High High Low Med High Med
Levels: Low Med High
The first line creates the basic character vector as funny, and the second line overwrites funny
by turning it into a factor. The order of these elements must correspond to the records in your
data frame. Now, you can simply use cbind to append this factor vector as a column to the
existing mydata.

R
> mydata <- cbind(mydata,funny)
R> mydata
person age gender funny
1 Peter 42 M High
2 Lois 40 F High
3 Meg 17 F Low
4 Chris 14 M Med
5 Stewie 1 M High
6 Brian 7 M Med

The rbind and cbind functions aren’t the only ways to extend a data frame. One useful
alternative for adding a variable is to use the dollar operator, much like adding a new member
to a named list. Suppose now you want to add another variable to mydata by including a
column with the age of the individuals in months, not years, calling this new variable age.mon.

R> mydata$age.mon <- mydata$age*12

R> Mydata
person age gender funny age.mon
1 Peter 42 M High 504
2 Lois 40 F High 480
3 Meg 17 F Low 204
4 Chris 14 M Med 168
5 Stewie 1 M High 12
6 Brian 7 M Med 84

This creates a new age.mon column with the dollar operator and at the same time assigns it
the vector of ages in years (already stored as age) multi-plied by 12.

Logical Record Subsets

you saw how to use logical flag vectors to subset data structures. This is a particularly useful
technique with data frames, where you’ll often want to examine a subset of entries that meet
certain criteria. For example, when working with data from a clinical drug trial, a researcher
might want to examine the results for just male participants and compare them to the results
for females. Or the researcher might want to look at the characteristics of individuals who
responded most positively to the drug.
Let’s continue to work with mydata. Say you want to examine all records corresponding to
males. you know that the following line will identify the relevant positions in the gender factor
vector:
R> mydata$gender=="M"
TRUE FALSE FALSE TRUE TRUE TRUE

This flags the male records. You can use this with the matrix-like syntax to get the male-only
subset.

R>mydata[mydata$gender=="M"
,]
person age gender funny age.mon
1 Peter 42 M High 504
4 Chris 14 M Med 168
5 Stewie 1 M High 12
6 Brian 7 M Med 84

This returns data for all variables for only the male participants. You can use the same
behavior to pick and choose which variables to return in the subset. For example, since you
know you are selecting the males only, you could omit gender from the result using a negative
numeric index in the column dimension.

R> mydata[mydata$gender=="M",-3]
person age funny age.mon
1 Peter 42 High 504
4 Chris 14 Med 168
5 Stewie 1 High 12
6 Brian 7 Med 84

If you don’t have the column number or if you want to have more control over the returned
columns, you can use a character vector of variable names instead.

R>
mydata[mydata$gender=="M",c("person","age","funny","age.mon")]
person age funny age.mon
1 Peter 42 High 504
4 Chris 14 Med 168
5 Stewie 1 High 12
6 Brian 7 Med 84

The logical conditions you use to subset a data frame can be as simple or as complicated as
you need them to be. The logical flag vector you place in the square brackets just has to match
the number of records in the data frame. Let’s extract from mydata the full records for
individuals who are more than 10 years old OR have a high degree of funniness.
R> mydata[mydata$age>10|mydata$funny=="High",]
person age gender funny age.mon
1 Peter 42 M High 504
2 Lois 40 F High 480
3 Meg 17 F Low 204
4 Chris 14 M Med 168
5 Stewie 1 M High 12

R Data Structures Guide
No ratings yet
R Data Structures Guide
10 pages
Unit 1.3
No ratings yet
Unit 1.3
36 pages
Dar Lecture 7
No ratings yet
Dar Lecture 7
24 pages
Kiran R1
No ratings yet
Kiran R1
12 pages
R Basics for Economics Students
No ratings yet
R Basics for Economics Students
7 pages
CH 03
No ratings yet
CH 03
42 pages
Ex 4 R Objects
No ratings yet
Ex 4 R Objects
6 pages
R Programming Essentials
No ratings yet
R Programming Essentials
9 pages
N2 Data in R
No ratings yet
N2 Data in R
7 pages
R Programming: Data Structures Guide
No ratings yet
R Programming: Data Structures Guide
18 pages
A Crash Course in R - Intro To Statistical Programming
No ratings yet
A Crash Course in R - Intro To Statistical Programming
53 pages
Week3 2020
No ratings yet
Week3 2020
20 pages
R Data Types 8
No ratings yet
R Data Types 8
7 pages
M2 Dar
No ratings yet
M2 Dar
46 pages
DSF 9-10
No ratings yet
DSF 9-10
25 pages
Introduction To R
No ratings yet
Introduction To R
21 pages
R Programming Data Types Overview
No ratings yet
R Programming Data Types Overview
27 pages
Obejcts in R A13
No ratings yet
Obejcts in R A13
8 pages
R Chapter4
No ratings yet
R Chapter4
8 pages
R Programming: © 2016 SMART Training Resources Pvt. LTD
No ratings yet
R Programming: © 2016 SMART Training Resources Pvt. LTD
28 pages
List and Data Frame
No ratings yet
List and Data Frame
18 pages
R - A Practical Course
No ratings yet
R - A Practical Course
42 pages
Data Types in R Programming
No ratings yet
Data Types in R Programming
9 pages
R Programming Essentials
No ratings yet
R Programming Essentials
27 pages
Creating and Manipulating Objects
No ratings yet
Creating and Manipulating Objects
12 pages
R
No ratings yet
R
15 pages
Data Analytics Using R
100% (1)
Data Analytics Using R
27 pages
MLlab 5 TH
No ratings yet
MLlab 5 TH
17 pages
Frs Unit - 2
No ratings yet
Frs Unit - 2
27 pages
R-Data Structures
No ratings yet
R-Data Structures
14 pages
Chapter - 3 - R Objects or Data Types
No ratings yet
Chapter - 3 - R Objects or Data Types
7 pages
R Programming-Chapiter 4
No ratings yet
R Programming-Chapiter 4
16 pages
Unit - Iii: R Vectors
No ratings yet
Unit - Iii: R Vectors
16 pages
Introduction To Spatial Data Handling in R
No ratings yet
Introduction To Spatial Data Handling in R
25 pages
R Data Structures - 07 - 4
No ratings yet
R Data Structures - 07 - 4
27 pages
18mit22c U4
No ratings yet
18mit22c U4
35 pages
R WorkSamples
No ratings yet
R WorkSamples
44 pages
Mod 2 Summary Table
No ratings yet
Mod 2 Summary Table
16 pages
R Pres
No ratings yet
R Pres
53 pages
Understanding the abs() Function in R
No ratings yet
Understanding the abs() Function in R
34 pages
Unit 3 Chatgpt
No ratings yet
Unit 3 Chatgpt
6 pages
People Analytics With R Part 3
No ratings yet
People Analytics With R Part 3
11 pages
R Introduction II
No ratings yet
R Introduction II
45 pages
R Network Analysis with igraph Guide
No ratings yet
R Network Analysis with igraph Guide
62 pages
Datatypes: (I) Vectors (Ii) Lists (Iii) Matrices (Iv) Arrays (V) Factors (Vi) Data Frames
No ratings yet
Datatypes: (I) Vectors (Ii) Lists (Iii) Matrices (Iv) Arrays (V) Factors (Vi) Data Frames
4 pages
R Programming Basics: Operations & Variables
No ratings yet
R Programming Basics: Operations & Variables
7 pages
1 - Introduction To Programming With R
No ratings yet
1 - Introduction To Programming With R
13 pages
R Lab Record 2024
No ratings yet
R Lab Record 2024
35 pages
Introduction To R For Business Analytics
No ratings yet
Introduction To R For Business Analytics
7 pages
DA Lab Week-2
No ratings yet
DA Lab Week-2
22 pages
Screenshot 2025-01-24 at 9.23.10 AM
No ratings yet
Screenshot 2025-01-24 at 9.23.10 AM
42 pages
R Programming Cheatsheet
100% (2)
R Programming Cheatsheet
6 pages
R Programming Cheat Sheet: Data Structures
No ratings yet
R Programming Cheat Sheet: Data Structures
2 pages
1 Introduction
No ratings yet
1 Introduction
88 pages
Problem: Logic Pyramid: Sno. Input Output
No ratings yet
Problem: Logic Pyramid: Sno. Input Output
28 pages
MFML - Lesson Plan in Mathematics 8, Q2-Illustrating A Relation and A Function
100% (14)
MFML - Lesson Plan in Mathematics 8, Q2-Illustrating A Relation and A Function
4 pages
Topic 7 Lesson 7-2: Exponential Models
No ratings yet
Topic 7 Lesson 7-2: Exponential Models
3 pages
Notes Topic 2.10 Key - Inverses of Exponential Functions
No ratings yet
Notes Topic 2.10 Key - Inverses of Exponential Functions
2 pages
PHY040404 Mathematical Physics
No ratings yet
PHY040404 Mathematical Physics
3 pages
Discrete-Time Systems z-Transforms
No ratings yet
Discrete-Time Systems z-Transforms
1 page
Scanned With Camscanner
No ratings yet
Scanned With Camscanner
10 pages
Bcal Peta
No ratings yet
Bcal Peta
2 pages
Teacher Education in Math
No ratings yet
Teacher Education in Math
14 pages
OSU Undergraduate Math Courses 2022-2023
No ratings yet
OSU Undergraduate Math Courses 2022-2023
174 pages
Iit Nurture Phase#1&2 Academic Planner
No ratings yet
Iit Nurture Phase#1&2 Academic Planner
15 pages
Surge and Logistic Functions Investigation
67% (3)
Surge and Logistic Functions Investigation
19 pages
Creating Plots in R Using Ggplot2 - Part 9: Function Plots
No ratings yet
Creating Plots in R Using Ggplot2 - Part 9: Function Plots
17 pages
Methods Unit 3 SAC 1 - Practice SAC - 2021
No ratings yet
Methods Unit 3 SAC 1 - Practice SAC - 2021
8 pages
M.A Economics Updated 1
No ratings yet
M.A Economics Updated 1
82 pages
Understanding Organizational Structures
100% (1)
Understanding Organizational Structures
6 pages
Calculus Limits for Grade 11
100% (1)
Calculus Limits for Grade 11
18 pages
Secondary Additional Maths 4 Student Textbook
No ratings yet
Secondary Additional Maths 4 Student Textbook
204 pages
PHP Lab Manual
0% (1)
PHP Lab Manual
27 pages
Chap 8.1 - Review of Integration by Substitution
No ratings yet
Chap 8.1 - Review of Integration by Substitution
9 pages
Syllabus Xii Sci 2025-26-1
No ratings yet
Syllabus Xii Sci 2025-26-1
4 pages
A User-Defined Element For Dynamic Analysis of Saturated Porous Media in ABAQUS
No ratings yet
A User-Defined Element For Dynamic Analysis of Saturated Porous Media in ABAQUS
17 pages
Lecture 01 - MTH305A
No ratings yet
Lecture 01 - MTH305A
7 pages
Handout 2-Axiomatic Probability
No ratings yet
Handout 2-Axiomatic Probability
17 pages
Example Piece Wise Revenue PDF
100% (1)
Example Piece Wise Revenue PDF
7 pages
Revision
No ratings yet
Revision
48 pages
BCA-122 Mathematics & Statistics PDF
100% (8)
BCA-122 Mathematics & Statistics PDF
242 pages
Robinson 82
No ratings yet
Robinson 82
23 pages
C Programming Tutorial Guide
100% (1)
C Programming Tutorial Guide
15 pages
Class 12 Maths Syllabus 2023-24
No ratings yet
Class 12 Maths Syllabus 2023-24
6 pages

Unit 1 Factor

Uploaded by

Unit 1 Factor

Uploaded by

Unit 1: Factor, List and Data Frames

# create a vector called 'gender'

R> foo <- list(matrix(data=1:4,nrow=2,ncol=2),c(T,F,T,T),"hello")

To overwrite a member of foo, you use the assignment operator.

Let’s start by adding names to the list foo from earlier.

R> names(foo) <- c("mymatrix","mylogicals","mystring")

You can subset this returned vector, too:

This returns the same thing as the earlier call of mydata[2,2].

R> mydata <- data.frame(person=c("Peter","Lois","Meg","Chris","Stewie"),

You now have person in the desired, nonfactor form.

R> funny <- c("High","High","Low","Med","High","Med")

R> mydata$age.mon <- mydata$age*12

Logical Record Subsets

You might also like