0% found this document useful (0 votes)

24 views34 pages

Chapter 6 Slides

Uploaded by

levinali1225

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views34 pages

Chapter 6 Slides

Uploaded by

levinali1225

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

STATS 20: Chapter 6 - Factors

Thomas Maierhofer

Fall 2024

1 / 34
Learning Objectives

2 / 34
Learning Objectives

After studying this chapter, you should be able to:

▶ Identify when to use factors.
▶ Create factors using factor().
▶ Differentiate between character vectors and factors.
▶ Understand how R stores factors.
▶ Summarize a categorical variable using table().
▶ Assign and reassign levels to a factor.
▶ Order the levels of a factor.
▶ Apply a function to subsets of a vector using tapply

3 / 34
Basic Definitions

4 / 34
Factors in Experimental Design

▶ In experimental design, a factor is an explanatory variable controlled by the

experimenter.
▶ Its levels are the different values a factor can take.
▶ Example: In an experiment on headache medications:
▶ The factor might be the medication.
▶ The levels could be the types of medication, like acetaminophen, ibuprofen, and
naproxen.

5 / 34
Factors are Categorical Variables

▶ The levels (categories) of a factor are often represented by numbers to show

ordering or just because.
▶ Example: The Saffir-Simpson hurricane scale uses numbers to denote hurricane
categories:
▶ Category 1, Category 2, etc., based on maximum sustained wind speeds.
▶ Problem: If entered as a numeric vector, R may not recognize that this represents
categorical data.

Review: Analyzing Categorical vs. Numerical Data

▶ Categorical variables and numerical variables are analyzed differently:
▶ Mean of ibuprofen and naproxen, for instance, would not be meaningful.
▶ Computing relative frequencies of reaction times in milliseconds does not make
sense

6 / 34
Factors in R
▶ Factors in R provide a way to store categorical data, especially when a vector
represents categories (levels).
▶ The factor() and as.factor() functions create or coerce vectors into factors.

group <- c("control", "treatment", "control", "treatment", "treatment")

group # Character vector

## [1] "control" "treatment" "control" "treatment" "treatment"

# Convert to a factor
group <- factor(group)
group # Factor vector

## [1] control treatment control treatment treatment

## Levels: control treatment
7 / 34
Operations on Factors

▶ Factors represent categorical data, so arithmetic operations cannot be applied

directly to them.
▶ Warning: Attempting to apply numeric operations to factors will cause a warning
and produce NA values.

group + 1

## Warning in Ops.factor(group, 1): ’+’ not meaningful for factors

## [1] NA NA NA NA NA

8 / 34
Working with Levels

9 / 34
Efficient Storage of Factors

▶ Factors are stored efficiently in R by internally coding levels as integers.

▶ This reduces memory usage for repeated values compared to character vectors.

typeof(group) # Internal storage type of the factor vector

## [1] "integer"

as.integer(group) # How levels of `group` are coded in R internally

## [1] 1 2 1 2 2

10 / 34
The levels() Function
The levels() function accesses the levels attribute of a factor vector.

attributes(group)

## $levels
## [1] "control" "treatment"
##
## $class
## [1] "factor"

The levels are stored as a character vector

levels(group) # Access the factor levels

## [1] "control" "treatment"

11 / 34
Modifying Factor Labels with levels()

▶ You can use levels() with the assignment <- operator to change factor labels.
▶ For example, change "control" to "placebo" in the group factor.

levels(group)[1] <- "placebo"

group # factor label for the first level is now "placebo"

## [1] placebo treatment placebo treatment treatment

## Levels: placebo treatment

12 / 34
Counting and Summarizing Levels

▶ The nlevels() function returns the number of levels in the factor.

▶ The table() function outputs a frequency table summarizing the factor levels.

nlevels(group) # Number of levels in `group`

## [1] 2

table(group) # Frequency table of the `group` factor

## group
## placebo treatment
## 2 3

13 / 34
Caution: Changing Factor Values
▶ Changing a factor element to a new value not already a level will:
▶ Replace the value with NA.
▶ Throw a warning.
group[5] <- "control" # Warning: "control" is not an existing level

## Warning in ‘[<-.factor‘(‘tmp‘, 5, value = "control"): invalid factor level,

## NA generated
group

## [1] placebo treatment placebo treatment <NA>

## Levels: placebo treatment

group[5] <- "placebo" # No warning: "placebo" is an existing level

group

## [1] placebo treatment placebo treatment placebo

## Levels: placebo treatment

14 / 34
Specifying All Possible Levels

The levels argument in the factor() function allows us to specify all possible
levels, even if some levels are not (yet) observed in the data.

# Sample hurricane category data with all possible levels

hurricanes <- factor(c(3, 1, 2, 5, 3, 3, 5), levels = c(1, 2, 3, 4, 5))
hurricanes

## [1] 3 1 2 5 3 3 5
## Levels: 1 2 3 4 5

Here, levels 1 through 5 are specified, even though level 4 is not observed.

15 / 34
Adding Levels Using the levels() Function
Levels can also be added to an existing factor by modifying the levels attribute
directly.
# Sample gender data
gender <- factor(c("M", "F", "F", "M", "M"))
levels(gender) # Current levels: "M", "F"

## [1] "F" "M"

levels(gender)[3] <- "X" # Add a new level "X"

levels(gender) # View Updated levels

## [1] "F" "M" "X"

gender

## [1] M F F M M
## Levels: F M X

The gender factor now has an additional level “X”, even though “X” is not observed in
the data.
16 / 34
Extracting Values from Factors
▶ Factors are special vectors that can be subset using square brackets (numeric
indices and logical indices work).
▶ When subsetting, the levels attribute of the original factor is retained, even if the
subset does not include all levels.

hurricanes[1:3] # Only contains 1, 2, 3

## [1] 3 1 2
## Levels: 1 2 3 4 5

hurricanes[c(rep(TRUE, 3), rep(FALSE, 4))] # same as above

## [1] 3 1 2
## Levels: 1 2 3 4 5
17 / 34
Removing Unobserved Labels
To remove the unobserved levels, we could invoke the factor() function again to
reset the levels attribute:

factor(hurricanes[1:3]) # resets the levels attribute

## [1] 3 1 2
## Levels: 1 2 3

or more directly remove unobserved levels by specifying the argument drop = TRUE in
the square brackets:

hurricanes[1:3, drop = TRUE] # remove unobserved level

## [1] 3 1 2
## Levels: 1 2 3
18 / 34
Ordered Levels

19 / 34
Ordered vs. Unordered Levels

▶ Ordinal variables: Categorical variables with a natural ordering (e.g., hurricane

categories, coffee sizes).
▶ Nominal variables: Categorical variables without a natural ordering (e.g., gender,
eye color).

Default Ordering in factor()

▶ By default, factor() orders character levels alphabetically and numeric levels in
increasing order.
▶ Lowercase letters come before uppercase in alphabetical order (a < A).

20 / 34
Example: Month Names in Alphabetical Order

If we create a factor of month names, the natural ordering will not be preserved.

month.name # Built-in character vector of month names

## [1] "January" "February" "March" "April" "May" "June"

## [7] "July" "August" "September" "October" "November" "Decemb

factor(month.name) # Alphabetical order

## [1] January February March April May June July

## [8] August September October November December
## 12 Levels: April August December February January July June March ... Se

21 / 34
The table function automatically orders factors by their levels, with unexpected results
when the levels are not in correct order.

table(x = factor(month.name))

## x
## April August December February January July June
## 1 1 1 1 1 1 1
## May November October September
## 1 1 1 1

22 / 34
The same happens with the plot() function:

plot(x = factor(month.name), y = 1:12)

12
10
8
y

6
4
2

April August February July June March May November September

23 / 34
Specifying a Custom Order for Levels
To set a custom order for levels, use the levels argument in factor() and set
ordered = TRUE to tell R you mean to save it explicitly as an ordered factor.

factor(month.name, levels = month.name) # levels in correct calendar order

## [1] January February March April May June July

## [8] August September October November December
## 12 Levels: January February March April May June July August ... Decembe

table(factor(month.name, levels = month.name))

##
## January February March April May June July
## 1 1 1 1 1 1 1
## September October November December
## 1 1 1 1
24 / 34
plot(x = factor(month.name, levels = month.name), y = 1:12) # much better
12
10
8
y

6
4
2

January March April May June July August October December

25 / 34
Explicitly Creating an Ordered Factor

▶ There is a sub-class of “factor” called “ordered” which means that this variable is
an ordered factor, i.e., an ordinal categorical variable
▶ most functions do not care about this distinction and just use whatever order the
levels are in.

ordered <- factor(month.name, levels = month.name, ordered = FALSE)

explicitly_ordered <- factor(month.name, levels = month.name, ordered = TRU

26 / 34
attributes(ordered) # class is "factor", levels are in correct order

## $levels
## [1] "January" "February" "March" "April" "May" "June"
## [7] "July" "August" "September" "October" "November" "Decemb
##
## $class
## [1] "factor"

attributes(explicitly_ordered) # this is of class "ordered" as well as "fac

## $levels
## [1] "January" "February" "March" "April" "May" "June"
## [7] "July" "August" "September" "October" "November" "Decemb
##
## $class
## [1] "ordered" "factor"
27 / 34
Operations on Subsets of Data

28 / 34
The Split-Apply-Combine Strategy

▶ Split the data into groups based on some criteria.

▶ Apply a function to each group independently.
▶ Combine the results into a data structure.

More specifically
▶ Using factor levels to subset and analyze specific categories is common in
statistical analysis, such as finding means or counts per level.
▶ Subsetting and logical indexing allow us to extract subsets of an object (usually
a vector or matrix) based on a condition or criterion.
▶ A natural application is to subset an object based on the levels of a factor (i.e.,
categories of a categorical variable).
▶ We can then apply functions to these subsets, enabling flexible data
manipulation and analysis by category.
▶ combine the results in a vector or matrix

29 / 34
The tapply() Function
The tapply() function is used to apply a function to subsets of a vector.
The syntax of tapply() is tapply(X, INDEX, FUN, ..., simplify = TRUE),
where the arguments are:
▶ X: A numeric or logical vector
▶ INDEX: A factor or list of factors that identifies the subsets. Non-factors will be
coerced into factors.
▶ FUN: The function to be applied.
▶ ...: Any optional arguments to be passed to the FUN function.
▶ simplify: Logical value that specifies whether to simplify the output to a matrix
or array.

The tapply() function splits the values of the vector X into groups, each group
corresponding to a level of the INDEX factor, then applies the function in FUN to each
group.
30 / 34
Example: Hurricanes Data
As an example, we will consider the hurricanes.RData file, which has four objects
category, pressure, wind, and year, containing measurements on 455 hurricanes
that occurred between 2006 and 2011.

load("hurricanes.RData") # Load the objects in the hurricanes data

category[1:10] # The Saffir-Simpson classification

## [1] 1 2 1 1 2 2 1 2 1 1
## Levels: 1 < 2 < 3 < 4 < 5

pressure[1:10] # Air pressure at the hurricane's center (in millibars)

## [1] 983 968 981 960 952 983 981 953 985 990

wind[1:10] # Hurricane's maximum sustained wind speed (in knots)

## [1] 65 90 80 65 95 85 70 95 80 70 31 / 34
Example ctd: Using tapply() for Grouped Calculations
▶ Suppose we want to determine if mean air pressure at a hurricane’s center is
related to its category.
▶ The tapply() function allows us to split the data by category and compute the
mean for each subset.

# Compute mean pressure grouped by hurricane category

tapply(X = pressure, INDEX = category, FUN = mean)

## 1 2 3 4 5
## 979.3766 964.3333 954.7407 940.3220 924.3000

From the output, we see that the mean pressure at a hurricane’s center is lower for
higher category hurricanes.
Question: How would you find the mean maximum sustained wind speed in each year?
32 / 34
Example ctd: Using tapply() with Multiple Factors
▶ Suppose we want the mean pressure for each category/year combination.
▶ The tapply() function can group values by combinations of levels from
multiple factors.
▶ When using multiple factors in tapply(), put the factors in a list in the INDEX
argument.

# Compute the mean pressure for each category/year combination

tapply(X = pressure, INDEX = list(category, year), FUN = mean)

## 2006 2007 2008 2009 2010 2011

## 1 983.9 981.5217 979.8158 977.9524 977.1948 979.3000
## 2 969.5 973.6000 957.0385 967.5000 966.5862 964.5385
## 3 957.0 948.0000 955.4286 953.6667 955.0000 954.6923
## 4 NA 933.7143 945.3750 948.8000 938.5238 942.6667
## 5 NA 924.3000 NA NA NA NA

Question: How would you find out how many observations are in each category (or 33 / 34
Last Slide: Why I don’t like Factors

▶ Factors can be unintuitive, especially with the default alphabetical ordering

▶ Modifying factors (e.g., adding levels) is cumbersome and can lead to
unexpected behavior, such as warnings and NA values.
▶ Arithmetic operations and other functions often don’t handle factors as expected
(character labels vs. internal storage as integer)
▶ Just use character vectors, they are simpler and more transparent for categorical
variables.

Why I Still Teach Factors

▶ Factors are a foundational data type in base R, widely used and often
encountered in code and data.
▶ Understanding factors is essential for using R, including in many R packages and
statistical functions.

34 / 34

Lenguaje R C3
No ratings yet
Lenguaje R C3
19 pages
Starting With R - 3
No ratings yet
Starting With R - 3
1 page
Rfactors
No ratings yet
Rfactors
31 pages
Introduction To Categorical Data AA 2016-2017
No ratings yet
Introduction To Categorical Data AA 2016-2017
70 pages
Categorical Data Courses
No ratings yet
Categorical Data Courses
191 pages
Dispensa - Completa Categorical Data
No ratings yet
Dispensa - Completa Categorical Data
211 pages
R Module 7 - Data Classes
No ratings yet
R Module 7 - Data Classes
45 pages
2.5 Factors
No ratings yet
2.5 Factors
1 page
Factors and Tables
100% (1)
Factors and Tables
10 pages
R Factors
No ratings yet
R Factors
12 pages
Factors Factors: LM GLM
No ratings yet
Factors Factors: LM GLM
3 pages
Factors With Forcats::: Cheat Sheet
No ratings yet
Factors With Forcats::: Cheat Sheet
1 page
Introduction To R: Factors
No ratings yet
Introduction To R: Factors
10 pages
r22 Unit3 Factors Dataframes
No ratings yet
r22 Unit3 Factors Dataframes
13 pages
Statistics With R Week 2
No ratings yet
Statistics With R Week 2
3 pages
IDS Notes Unit 3
No ratings yet
IDS Notes Unit 3
14 pages
Factors
No ratings yet
Factors
1 page
R-Training For Print
No ratings yet
R-Training For Print
11 pages
R Factors: Categorical vs Continuous
No ratings yet
R Factors: Categorical vs Continuous
4 pages
Data Types in R
No ratings yet
Data Types in R
8 pages
Exploratory Data Analysis and Graphics: Lab 2
No ratings yet
Exploratory Data Analysis and Graphics: Lab 2
19 pages
VCD Tutorial PDF
No ratings yet
VCD Tutorial PDF
37 pages
VCD Tutorial
No ratings yet
VCD Tutorial
37 pages
Unit 3 2
No ratings yet
Unit 3 2
21 pages
R Installation and Basics Guide
No ratings yet
R Installation and Basics Guide
30 pages
Factors
No ratings yet
Factors
23 pages
Advance R Prog.-1
No ratings yet
Advance R Prog.-1
24 pages
Basic Data Types
No ratings yet
Basic Data Types
48 pages
R Programming - Lec 7
No ratings yet
R Programming - Lec 7
6 pages
Factors With Forcats::: Cheat Sheet
No ratings yet
Factors With Forcats::: Cheat Sheet
1 page
Sunil Test
No ratings yet
Sunil Test
15 pages
Lec 08
No ratings yet
Lec 08
22 pages
R ggplot2 Code Examples & Tips
No ratings yet
R ggplot2 Code Examples & Tips
22 pages
Day 2
No ratings yet
Day 2
5 pages
Lecture 5 (Managing and Understanding Data)
No ratings yet
Lecture 5 (Managing and Understanding Data)
9 pages
Forcats Factors Cheat Sheet
No ratings yet
Forcats Factors Cheat Sheet
1 page
Ex 3 Univariate Data
No ratings yet
Ex 3 Univariate Data
3 pages
Applied Biostatistics 2020 - 02 The R Environment
No ratings yet
Applied Biostatistics 2020 - 02 The R Environment
27 pages
Multilevel Models in R Presente and Future
No ratings yet
Multilevel Models in R Presente and Future
8 pages
Intro Stat
No ratings yet
Intro Stat
47 pages
Factors With Forcats::: Cheat Sheet
No ratings yet
Factors With Forcats::: Cheat Sheet
1 page
Chapter - Two
No ratings yet
Chapter - Two
38 pages
Introduction To R Notes
No ratings yet
Introduction To R Notes
1 page
Data Processing Techniques in R
No ratings yet
Data Processing Techniques in R
3 pages
STAT 545A Class Meetings #5 and #6 Monday, September 23, 2013 Wednesday, September 25, 2013
No ratings yet
STAT 545A Class Meetings #5 and #6 Monday, September 23, 2013 Wednesday, September 25, 2013
74 pages
Experiment 2
No ratings yet
Experiment 2
7 pages
Discussion Forum Unit 2
No ratings yet
Discussion Forum Unit 2
2 pages
Data Organization Techniques in Statistics
No ratings yet
Data Organization Techniques in Statistics
14 pages
Collection and Presentation of Data-3
No ratings yet
Collection and Presentation of Data-3
10 pages
Rtips. Revival 2012!: Paul E. Johnson June 8, 2012
No ratings yet
Rtips. Revival 2012!: Paul E. Johnson June 8, 2012
72 pages
R Data Structures - 07 - 4
No ratings yet
R Data Structures - 07 - 4
27 pages
Fixed and Random Effects: Jos Elkink
No ratings yet
Fixed and Random Effects: Jos Elkink
121 pages
Data Types in R Programming
No ratings yet
Data Types in R Programming
9 pages
Untitled Document
No ratings yet
Untitled Document
27 pages
1 Intro To Statistics 1 6
0% (1)
1 Intro To Statistics 1 6
106 pages
Chapter 4 Slides
No ratings yet
Chapter 4 Slides
55 pages
Final Exam
No ratings yet
Final Exam
5 pages
R Data Frames & Lists Guide
No ratings yet
R Data Frames & Lists Guide
104 pages
Chapter 5 Slides
No ratings yet
Chapter 5 Slides
73 pages
HDL Lab Manual for ECE IV Semester
No ratings yet
HDL Lab Manual for ECE IV Semester
67 pages
8.ISCA RJEngS 2021 004
No ratings yet
8.ISCA RJEngS 2021 004
5 pages
STS10 Parts Manual 2005 PDF
No ratings yet
STS10 Parts Manual 2005 PDF
150 pages
Key Takeaways - Basics of Plastic v2
No ratings yet
Key Takeaways - Basics of Plastic v2
2 pages
Sahaja Yoga Mantra Book 2014-07-06
100% (3)
Sahaja Yoga Mantra Book 2014-07-06
321 pages
European Pharmacopoeia 8.0, Vol. 2 - Европейская Фармакопея 8.0, Том 2 (PDFDrive)
100% (1)
European Pharmacopoeia 8.0, Vol. 2 - Европейская Фармакопея 8.0, Том 2 (PDFDrive)
2,133 pages
Divine Revelations of LOkAchArya-English
No ratings yet
Divine Revelations of LOkAchArya-English
246 pages
Strategic Matrix Upd
No ratings yet
Strategic Matrix Upd
12 pages
Workplace Hazard Identification Guide
No ratings yet
Workplace Hazard Identification Guide
50 pages
Lesson 1 - Carnot Cycle
No ratings yet
Lesson 1 - Carnot Cycle
27 pages
A Repertory of Desires and Aversions
No ratings yet
A Repertory of Desires and Aversions
6 pages
Orthodontic Removable Appliances Guide
No ratings yet
Orthodontic Removable Appliances Guide
9 pages
Pre-Feasibility R E P O R T: Ammonia / Urea Fertilizer Project
No ratings yet
Pre-Feasibility R E P O R T: Ammonia / Urea Fertilizer Project
37 pages
Kitchen Conversion Chart
No ratings yet
Kitchen Conversion Chart
1 page
Catalogue Nhua Tien Phong
No ratings yet
Catalogue Nhua Tien Phong
67 pages
96 E1 PA PDF
No ratings yet
96 E1 PA PDF
4 pages
Occupancy Separation
No ratings yet
Occupancy Separation
8 pages
Food Wastage PPT Er. Vivek Mahajan G Mail
No ratings yet
Food Wastage PPT Er. Vivek Mahajan G Mail
53 pages
Chapter 8 Visual Media
No ratings yet
Chapter 8 Visual Media
46 pages
Ragi & Palm Sugar Cookie Analysis
No ratings yet
Ragi & Palm Sugar Cookie Analysis
29 pages
Vehicle Inventory with Costs
No ratings yet
Vehicle Inventory with Costs
2 pages
Delayed Coking Process Guide
No ratings yet
Delayed Coking Process Guide
26 pages
Rubber Fender Catalog (Small)
100% (1)
Rubber Fender Catalog (Small)
36 pages
Geography Notes
No ratings yet
Geography Notes
2 pages
Displacement Sensor Amplifier Guide
No ratings yet
Displacement Sensor Amplifier Guide
4 pages
St. Louis Youth Sports COVID-19 Guidelines
No ratings yet
St. Louis Youth Sports COVID-19 Guidelines
8 pages
Performance Evaluation of Routing Protocols For Video Conference Over MPLS VPN Network
No ratings yet
Performance Evaluation of Routing Protocols For Video Conference Over MPLS VPN Network
7 pages
Officejet Pro X476 X576 Troubleshooting Manual
100% (1)
Officejet Pro X476 X576 Troubleshooting Manual
156 pages
Fluid Dynamics Theoretical and Computational Approaches Third Edition Warsi Download
100% (12)
Fluid Dynamics Theoretical and Computational Approaches Third Edition Warsi Download
135 pages
A Study On Rooftop Tower Construction For Selection of An Appropriate Location To Minimize Additional Stress On Host Structure
No ratings yet
A Study On Rooftop Tower Construction For Selection of An Appropriate Location To Minimize Additional Stress On Host Structure
14 pages

Chapter 6 Slides

Uploaded by

Chapter 6 Slides

Uploaded by

STATS 20: Chapter 6 - Factors

After studying this chapter, you should be able to:

▶ In experimental design, a factor is an explanatory variable controlled by the

▶ The levels (categories) of a factor are often represented by numbers to show

Review: Analyzing Categorical vs. Numerical Data

group <- c("control", "treatment", "control", "treatment", "treatment")

## [1] "control" "treatment" "control" "treatment" "treatment"

## [1] control treatment control treatment treatment

▶ Factors represent categorical data, so arithmetic operations cannot be applied

## Warning in Ops.factor(group, 1): ’+’ not meaningful for factors

▶ Factors are stored efficiently in R by internally coding levels as integers.

typeof(group) # Internal storage type of the factor vector

as.integer(group) # How levels of `group` are coded in R internally

The levels are stored as a character vector

levels(group) # Access the factor levels

## [1] "control" "treatment"

levels(group)[1] <- "placebo"

## [1] placebo treatment placebo treatment treatment

▶ The nlevels() function returns the number of levels in the factor.

nlevels(group) # Number of levels in `group`

table(group) # Frequency table of the `group` factor

## Warning in ‘[<-.factor‘(‘*tmp*‘, 5, value = "control"): invalid factor level,

## [1] placebo treatment placebo treatment <NA>

group[5] <- "placebo" # No warning: "placebo" is an existing level

## [1] placebo treatment placebo treatment placebo

# Sample hurricane category data with all possible levels

## [1] "F" "M"

levels(gender)[3] <- "X" # Add a new level "X"

## [1] "F" "M" "X"

hurricanes[1:3] # Only contains 1, 2, 3

hurricanes[c(rep(TRUE, 3), rep(FALSE, 4))] # same as above

factor(hurricanes[1:3]) # resets the levels attribute

hurricanes[1:3, drop = TRUE] # remove unobserved level

▶ Ordinal variables: Categorical variables with a natural ordering (e.g., hurricane

Default Ordering in factor()

month.name # Built-in character vector of month names

## [1] "January" "February" "March" "April" "May" "June"

factor(month.name) # Alphabetical order

## [1] January February March April May June July

plot(x = factor(month.name), y = 1:12)

April August February July June March May November September

factor(month.name, levels = month.name) # levels in correct calendar order

## [1] January February March April May June July

table(factor(month.name, levels = month.name))

January March April May June July August October December

ordered <- factor(month.name, levels = month.name, ordered = FALSE)

attributes(explicitly_ordered) # this is of class "ordered" as well as "fac

▶ Split the data into groups based on some criteria.

load("hurricanes.RData") # Load the objects in the hurricanes data

pressure[1:10] # Air pressure at the hurricane's center (in millibars)

wind[1:10] # Hurricane's maximum sustained wind speed (in knots)

# Compute mean pressure grouped by hurricane category

# Compute the mean pressure for each category/year combination

## 2006 2007 2008 2009 2010 2011

▶ Factors can be unintuitive, especially with the default alphabetical ordering

Why I Still Teach Factors

You might also like

## Warning in ‘[<-.factor‘(‘tmp‘, 5, value = "control"): invalid factor level,