0% found this document useful (0 votes)

27 views43 pages

DS-R Block 3-1 All

The document provides an overview of the dplyr package in R, which is essential for data manipulation and transformation. It covers key features, functions, and operations such as filtering, selecting, mutating, summarizing, grouping, and arranging data, along with examples. Additionally, it explains the use of the pipe operator for chaining operations and the various types of joins available for data blending.

Uploaded by

Jeya preetha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views43 pages

DS-R Block 3-1 All

Uploaded by

Jeya preetha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Data Science using R

Introduction to dplyr

© Kalasalingam Academy of Research and Education

Introduction to dplyr
dplyr is a powerful R package designed for data manipulation and transformation,
making it easier to work with data frames and perform common data operations. It is part of
the tidyverse, a collection of R packages that share an underlying design philosophy and
grammar, which makes data science in R more efficient and intuitive.

Features of dplyr

• Simplified Syntax: dplyr offers a clean and consistent set of functions that allow for
straightforward data manipulation.

• Chaining Operations: You can use the pipe operator (%>%) to chain together multiple
operations, making your code more readable and concise.

• Performance: dplyr is optimized for performance, especially with large datasets,

making it suitable for data analysis tasks.
Functions of dplyr
1. filter()
The filter() function is used to select rows from a data frame that meet specific conditions.
Syntax filter(data, condition)
Example
# Load dplyr
library(dplyr)

# Create a sample data frame

df <- data.frame(
Name = c("Alice", "Bob", "Charlie", "David"),
Age = c(25, 30, 35, 40),
Salary = c(50000, 60000, 70000, 80000)
)
# Filter rows where Age is greater than 30
filtered_df <- filter(df, Age > 30)
print(filtered_df)
Functions of dplyr
2. select()
The select() function is used to choose specific columns from a data frame.
Syntax select(data, columns)
Example
# Select the Name and Salary columns
selected_df <- select(df, Name, Salary)
print(selected_df)
3. mutate()
The mutate() function is used to create new columns or modify existing ones.
Syntax mutate(data, new_column = expression)
Example
# Add a new column for Annual Salary
mutated_df <- mutate(df, Annual_Salary = Salary * 12)
print(mutated_df)
Functions of dplyr
4. summarise() (or summarize())

The summarise() function is used to compute summary statistics for a data frame.

Syntax summarise(data, summary_statistic = function(column))

Example

# Calculate the average salary

summary_df <- summarise(df, Average_Salary = mean(Salary))
print(summary_df)
Functions of dplyr
5. group_by()
The group_by() function is used to group data by one or more variables. This is often used
conjunction with summarise() to perform calculations on grouped data.
Syntax group_by(data, grouping_variable)
Example
# Create another data frame with a department column
df2 <- data.frame(
Name = c("Alice", "Bob", "Charlie", "David"),
Department = c("HR", "IT", "IT", "HR"),
Salary = c(50000, 60000, 70000, 80000)
)
# Group by Department and calculate the average salary
grouped_df <- df2 %>%
group_by(Department) %>%
summarise(Average_Salary = mean(Salary))
Functions of dplyr
6. arrange()

The arrange() function is used to reorder rows in a data frame based on one or more
variables.

Syntax arrange(data, column)

Example

# Arrange the data frame by Salary in descending order

arranged_df <- arrange(df, desc(Salary))
print(arranged_df)
Functions of dplyr
7. join() Functions

dplyr provides several functions for joining data frames, including inner_join(),
left_join(), right_join(), and full_join().

Example
# Create another data frame for joining
df3 <- data.frame(
Name = c("Alice", "Bob"),
Department = c("HR", "IT")
)
# Inner join df2 with df3
joined_df <- inner_join(df2, df3, by = "Name")
print(joined_df)
Functions of dplyr
8. Pipe Operator (%>%)

The pipe operator is a key feature of dplyr that allows you to chain multiple operations
together in a readable way.

Example

# Using pipe to chain operations

result_df <- df %>%
filter(Age > 30) %>%
select(Name, Salary) %>%
mutate(Annual_Salary = Salary * 12)
print(result_df)
Data Science using R

Data manipulation in R with dplyr

© Kalasalingam Academy of Research and Education

Data manipulation in R with dplyr
Data manipulation in R with dplyr is a key aspect of data analysis, allowing you to
clean, transform, and summarize data efficiently. Below, we will explore various common
data manipulation tasks using the dplyr package, including filtering, selecting, mutating,
summarizing, grouping, and arranging data.

Setting Up dplyr

Before using dplyr, ensure that you have it installed and loaded into your R session.

# Install dplyr if you haven't already

install.packages("dplyr")

# Load the dplyr package

library(dplyr)
Data manipulation in R with dplyr
Sample Data

Let's create a sample data frame that we will use for our examples:

# Create a sample data frame

1. Filtering Rows with filter()

The filter() function is used to subset rows based on specific conditions.

# Filter rows where Age is greater than 30

filtered_data <- filter(data, Age > 30)
print(filtered_data)
2. Selecting Columns with select()

The select() function is used to choose specific columns from a data frame.

# Select the Name and Salary columns

selected_data <- select(data, Name, Salary)
print(selected_data)
Common Data Manipulation Tasks
3. Adding New Columns with mutate()

The mutate() function allows you to create new columns or modify existing ones.

# Add a new column for Annual Salary

mutated_data <- mutate(data, Annual_Salary = Salary * 12)
print(mutated_data)
4. Summarizing Data with summarise()

The summarise() function is used to calculate summary statistics.

# Calculate the average salary

summary_data <- summarise(data, Average_Salary = mean(Salary))
print(summary_data)
Common Data Manipulation Tasks
5. Grouping Data with group_by() and Summarizing
The group_by() function is used to group data by one or more variables. It is often followed by
summarise() to perform calculations on each group.
# Group by Department and calculate the average salary
grouped_data <- data %>%
group_by(Department) %>%
summarise(Average_Salary = mean(Salary), .groups = "drop")

print(grouped_data)

6. Arranging Rows with arrange()

The arrange() function is used to reorder rows based on the values of one or more columns.
# Arrange the data frame by Salary in descending order
arranged_data <- arrange(data, desc(Salary))
print(arranged_data)
Common Data Manipulation Tasks

7. Chaining Operations with the Pipe Operator (%>%)

The pipe operator allows you to chain multiple dplyr functions together for a more
readable workflow.

# Chain operations to filter, select, and mutate data

result_data <- data %>%
filter(Age < 35) %>%
select(Name, Salary) %>%
mutate(Annual_Salary = Salary * 12)

print(result_data)
Common Data Manipulation Tasks
8. Joining Data with join() Functions

dplyr provides various functions for joining data frames, such as inner_join(), left_join(),
right_join(), and full_join().

Example of left_join()

# Create another data frame for joining

additional_data <- data.frame(
Name = c("Alice", "Bob", "Charlie"),
Bonus = c(5000, 6000, 7000)
)
# Perform a left join
joined_data <- left_join(data, additional_data, by = "Name")
Data Science using R

Selecting, Mutating, Filtering, Arranging

and Summarising

© Kalasalingam Academy of Research and Education

Selecting, Mutating, Filtering, Arranging and Summarising
In R programming, particularly when using the dplyr package, the functions for
selecting, mutating, filtering, arranging, and summarizing data frames are essential for
effective data manipulation. Below, we will explore each of these operations in detail,
along with examples to illustrate their usage.

Setting Up dplyr

Before we start, make sure you have the dplyr package installed and loaded in your R
session:
# Install dplyr if you haven't already
install.packages("dplyr")
# Load the dplyr package
library(dplyr)
Selecting, Mutating, Filtering, Arranging and Summarising
Sample Data Frame

We will use a sample data frame for demonstration purposes:

# Create a sample data frame

data <- data.frame(
Name = c("Alice", "Bob", "Charlie", "David", "Eva"),
Age = c(25, 30, 35, 40, 28),
Salary = c(50000, 60000, 70000, 80000, 65000),
Department = c("HR", "IT", "IT", "HR", "Finance")
)
# Print the original data
print(data)
Selecting, Mutating, Filtering, Arranging and Summarising
1. Selecting Columns with select()
The select() function is used to choose specific columns from a data frame.
Syntax select(data, columns)
Example
# Select the Name and Salary columns
selected_data <- select(data, Name, Salary)
print(selected_data)
2. Mutating Data with mutate()
The mutate() function is used to add new columns or modify existing ones.
Syntax mutate(data, new_column = expression)
Example
# Add a new column for Annual Salary
mutated_data <- mutate(data, Annual_Salary = Salary * 12)
print(mutated_data)
Selecting, Mutating, Filtering, Arranging and Summarising
3. Filtering Rows with filter()
The filter() function is used to subset rows based on specific conditions.
Syntax filter(data, condition)
Example
# Filter rows where Age is greater than 30
filtered_data <- filter(data, Age > 30)
print(filtered_data)
4. Arranging Rows with arrange()
The arrange() function is used to reorder rows based on the values of one or more columns.
Syntax arrange(data, column)
Example
# Arrange the data frame by Salary in descending order
arranged_data <- arrange(data, desc(Salary))
print(arranged_data)
Selecting, Mutating, Filtering, Arranging and Summarising
5. Summarizing Data with summarise()
The summarise() function is used to calculate summary statistics for one or more columns.
Syntax summarise(data, summary_statistic = function(column))
Example
# Calculate the average salary
summary_data <- summarise(data, Average_Salary = mean(Salary))
print(summary_data)
6. Grouping Data with group_by()
The group_by() function is often used in conjunction with summarise() to perform calculations
on grouped data.
Syntax group_by(data, grouping_variable)
Example
# Group by Department and calculate the average salary
grouped_data <- data %>%
group_by(Department) %>%
summarise(Average_Salary = mean(Salary), .groups = "drop")
print(grouped_data)
Selecting, Mutating, Filtering, Arranging and Summarising
7. Combining Operations

You can combine these operations using the pipe operator (%>%) for a more readable
workflow.

Example

# Combine filtering, selecting, and mutating in one chain

result_data <- data %>%
filter(Age < 35) %>%
select(Name, Salary) %>%
mutate(Annual_Salary = Salary * 12)
print(result_data)
Data Science using R
Pipe operator R programming

Pipe operator R programming
The pipe operator (%>%) in R, provided by the magrittr package (which is also
part of the tidyverse), is a powerful tool for chaining together multiple functions in a
clean and readable way. It allows you to take the output of one function and pass it
directly as an input to the next function, enabling a streamlined workflow in data
manipulation and analysis.

Benefits of Using the Pipe Operator

1. Readability: The pipe operator enhances the readability of your code by allowing
you to express a sequence of operations in a linear fashion, resembling natural
language.
2. Conciseness: It reduces the need for temporary variables and makes the code
cleaner.
3. Chaining Functions: It allows you to easily combine multiple operations without
nesting functions.
Pipe operator R programming
Basic Syntax
The general syntax for using the pipe operator is as follows:

data %>% function1(arguments) %>% function2(arguments) %>%

function3(arguments)
In this syntax:

• data is the initial dataset.

• function1, function2, and function3 are the functions you want to apply sequentially.

Example of the Pipe Operator

Let's walk through a comprehensive example using a sample data frame to illustrate how the
pipe operator works in R.
Pipe operator R programming
Setting Up
First, ensure that you have the necessary packages installed and loaded:
# Install the tidyverse if you haven't already
install.packages("tidyverse")
# Load the dplyr package
library(dplyr)
Sample Data Frame
We will use the following sample data frame for our examples:
# Create a sample data frame
data <- data.frame(
Name = c("Alice", "Bob", "Charlie", "David", "Eva"),
Age = c(25, 30, 35, 40, 28),
Salary = c(50000, 60000, 70000, 80000, 65000),
Department = c("HR", "IT", "IT", "HR", "Finance")
)
Pipe operator R programming
Example: Using the Pipe Operator

Here’s how you can use the pipe operator to perform a series of operations on the data:

# Using the pipe operator for data manipulation

result <- data %>%
filter(Age > 28) %>% # Filter rows where Age is greater than 28
select(Name, Salary) %>% # Select only Name and Salary columns
mutate(Annual_Salary = Salary * 12) %>% # Create a new column for Annual Salary
arrange(desc(Annual_Salary)) # Arrange the data by Annual Salary in descending order
# Print the result
print(result)
Data Science using R

Data blending and joining

Data blending and joining R programming

Data blending and joining in R involves combining multiple datasets into a

single cohesive dataset for analysis. This is a crucial step in data preparation,
allowing you to create a unified dataset that includes all relevant information from
different sources.

In R, the dplyr package provides a set of powerful functions for joining

data frames. Below are the most common types of joins, along with examples to
demonstrate their usage.
Data blending and joining R programming
Setting Up dplyr

Ensure that you have the dplyr package installed and loaded:

# Install dplyr if you haven't already

install.packages("dplyr")

# Load the dplyr package

library(dplyr)
Data blending and joining R programming
Sample Data Frames
Let’s create two sample data frames that we can use for our joining examples:
# Create the first data frame
employees <- data.frame(
Employee_ID = c(1, 2, 3, 4, 5),
Name = c("Alice", "Bob", "Charlie", "David", "Eva"),
Department = c("HR", "IT", "IT", "HR", "Finance")
)
# Create the second data frame
salaries <- data.frame(
Employee_ID = c(1, 2, 3, 4, 6),
Salary = c(50000, 60000, 70000, 80000, 90000)
)
# Print the data frames
print(employees)
print(salaries)
Data blending and joining R programming
Types of Joins in dplyr
1. Inner Join (inner_join()): Returns only the rows with matching values in both data
frames.
inner_joined <- inner_join(employees, salaries, by = "Employee_ID")
print(inner_joined)
2. Left Join (left_join()): Returns all rows from the left data frame and the matched rows
from the right data frame. If there is no match, the result will contain NA for columns
from the right data frame.
left_joined <- left_join(employees, salaries, by = "Employee_ID")
print(left_joined)
3. Right Join (right_join()): Returns all rows from the right data frame and the matched
rows from the left data frame. If there is no match, the result will contain NA for
columns from the left data frame.
right_joined <- right_join(employees, salaries, by = "Employee_ID")
print(right_joined)
Data blending and joining R programming
4. Full Join (full_join()): Returns all rows from both data frames, with NA in places
where there is no match.
full_joined <- full_join(employees, salaries, by = "Employee_ID")
print(full_joined)
5. Semi Join (semi_join()): Returns all rows from the left data frame where there are
matching values in the right data frame, but does not include any columns from the right
data frame.
semi_joined <- semi_join(employees, salaries, by = "Employee_ID")
print(semi_joined)
6. Anti Join (anti_join()): Returns all rows from the left data frame where there are no
matching values in the right data frame.
anti_joined <- anti_join(employees, salaries, by = "Employee_ID")
print(anti_joined)
Data Science using R

Outliers and Missing Values Treatment

Handling outliers and missing values is a crucial part of data preprocessing in any
data analysis or machine learning project. In R, you can use various techniques to
identify, treat, and impute these data issues. Below, we will explore both outliers and
missing values treatment in detail.
Outliers Treatment
Outliers are data points that significantly differ from the rest of the dataset. They can
skew results and lead to misleading conclusions if not handled appropriately.
Outliers and Missing Values Treatment
1. Identifying Outliers
You can identify outliers using several methods:
•Visual Methods: Boxplots and scatter plots can visually reveal outliers.
•Statistical Methods: Use the IQR (Interquartile Range) method or Z-scores.
Example: Using Boxplots and IQR
# Load necessary library
library(ggplot2)
# Create a sample data
setdata <- data.frame( Value = c(10, 12, 12, 13, 12, 15, 18, 19, 100) # 100 is an outlier
)
# Boxplot to visualize outliers
ggplot(data, aes(y = Value)) + geom_boxplot() + ggtitle("Boxplot to Identify Outliers")
Outliers and Missing Values Treatment
Example: Using IQR Method
# Calculate the IQR
Q1 <- quantile(data$Value, 0.25)
Q3 <- quantile(data$Value, 0.75)
IQR_value <- Q3 - Q1
# Determine outlier boundaries
lower_bound <- Q1 - 1.5 * IQR_value
upper_bound <- Q3 + 1.5 * IQR_value
# Identify outliers
outliers <- data$Value[data$Value < lower_bound | data$Value > upper_bound]
print(outliers)
Outliers and Missing Values Treatment
2. Treating Outliers
Once identified, you can treat outliers in several ways:
•Remove Outliers: Simply exclude them from the dataset.
data_no_outliers <- data[data$Value >= lower_bound & data$Value <= upper_bound, ]
•Transform Data: Apply transformations (like log or square root) to reduce the effect of
outliers.
•Impute Values: Replace outliers with a statistical measure (e.g., mean or median).
# Replace outliers with the median
data$Value[data$Value < lower_bound | data$Value > upper_bound] <-
median(data$Value)
Outliers and Missing Values Treatment

Missing Values Treatment

Missing values can occur for various reasons and can significantly impact data analysis. Handling
them appropriately is vital.
1. Identifying Missing Values
You can check for missing values using the is.na() function or the summary() function.
# Create a sample dataset with missing values
data_with_na <- data.frame( Name = c("Alice", "Bob", NA, "David", "Eva"),
Age = c(25, NA, 35, 40, 28)
)
# Check for missing values
summary(data_with_na)
Outliers and Missing Values Treatment

2. Treating Missing Values

There are several strategies to handle missing values:
•Remove Rows with Missing Values: This is straightforward but may lead to loss
of important data.
data_cleaned <- na.omit(data_with_na)
•Impute Missing Values: Replace missing values with appropriate substitutes
(mean, median, mode, or using predictive models).
Outliers and Missing Values Treatment
Example: Mean Imputation
# Impute missing age with the mean age
data_with_na$Age[is.na(data_with_na$Age)] <- mean(data_with_na$Age, na.rm = TRUE)
Example: Using mice Package for Multiple Imputation
# Install mice package if not installed
install.packages("mice")library(mice)

# Use mice to impute missing value

simputed_data <- mice(data_with_na, m = 5, method = 'pmm', maxit = 50)
completed_data <- complete(imputed_data)
print(completed_data)

MIT 302 - Statistical Computing II - Tutorial 02
No ratings yet
MIT 302 - Statistical Computing II - Tutorial 02
5 pages
Lab11
No ratings yet
Lab11
2 pages
R Programming Cont..
No ratings yet
R Programming Cont..
24 pages
Data Analytics-34-41
No ratings yet
Data Analytics-34-41
8 pages
R Dplyr - Data Manipulation (50 Examples)
No ratings yet
R Dplyr - Data Manipulation (50 Examples)
47 pages
Tidyr & Dplyr Functions Guide
No ratings yet
Tidyr & Dplyr Functions Guide
3 pages
Tutorial-Introduction To Dplyr
No ratings yet
Tutorial-Introduction To Dplyr
54 pages
R Data Subsetting & Manipulation Guide
No ratings yet
R Data Subsetting & Manipulation Guide
44 pages
Advanced Data Management with dplyr
No ratings yet
Advanced Data Management with dplyr
36 pages
Essential dplyr Functions for Data Science
No ratings yet
Essential dplyr Functions for Data Science
31 pages
Installing and Using dplyr in R
No ratings yet
Installing and Using dplyr in R
13 pages
Data Manipulation in R
No ratings yet
Data Manipulation in R
5 pages
Data Handling and Manipulation in R
No ratings yet
Data Handling and Manipulation in R
18 pages
Working with Data Frames in R
No ratings yet
Working with Data Frames in R
8 pages
Introduction to dplyr Functions
No ratings yet
Introduction to dplyr Functions
23 pages
05 Dplyr
No ratings yet
05 Dplyr
37 pages
21Ai51T - Programming Language For Ai: Innovative Assignment - III
No ratings yet
21Ai51T - Programming Language For Ai: Innovative Assignment - III
13 pages
Advanced R Guide for Beginners
No ratings yet
Advanced R Guide for Beginners
73 pages
Apply Funcs DT
No ratings yet
Apply Funcs DT
32 pages
R Data Manipulation Guide
No ratings yet
R Data Manipulation Guide
46 pages
R Data Manipulation Basics Guide
No ratings yet
R Data Manipulation Basics Guide
31 pages
Introduction To Dplyr
No ratings yet
Introduction To Dplyr
14 pages
Tidyverse Pres
No ratings yet
Tidyverse Pres
20 pages
Mastering tidyr and dplyr in R
No ratings yet
Mastering tidyr and dplyr in R
33 pages
ProgrammingForDS16 Rdatamanipulation
No ratings yet
ProgrammingForDS16 Rdatamanipulation
20 pages
Dplyr Grammar for Data Wrangling
No ratings yet
Dplyr Grammar for Data Wrangling
21 pages
CH 3
No ratings yet
CH 3
33 pages
(R) Internal-2 Q & A
No ratings yet
(R) Internal-2 Q & A
65 pages
DS-R Block 3 MCQ Question Bank
No ratings yet
DS-R Block 3 MCQ Question Bank
6 pages
Chapter 03 Wrangling
No ratings yet
Chapter 03 Wrangling
40 pages
Solutions For QB3
No ratings yet
Solutions For QB3
14 pages
Nutrition Calculator for Recipes
No ratings yet
Nutrition Calculator for Recipes
16 pages
RSTUDIO
No ratings yet
RSTUDIO
44 pages
R Study Material I
No ratings yet
R Study Material I
8 pages
Pushpendra Lab File
No ratings yet
Pushpendra Lab File
51 pages
R Programming Basics and Functions
No ratings yet
R Programming Basics and Functions
13 pages
R Language - Experiment 1 (21-01-25)
No ratings yet
R Language - Experiment 1 (21-01-25)
8 pages
Rtips. Revival 2012!: Paul E. Johnson June 8, 2012
No ratings yet
Rtips. Revival 2012!: Paul E. Johnson June 8, 2012
72 pages
R Programming Basics and Data Frames
No ratings yet
R Programming Basics and Data Frames
5 pages
Data Frames in R: Employee Example
No ratings yet
Data Frames in R: Employee Example
9 pages
First Course On R
No ratings yet
First Course On R
26 pages
All Codes
No ratings yet
All Codes
10 pages
Harsh
No ratings yet
Harsh
9 pages
Base R
No ratings yet
Base R
9 pages
Mydata - Read - CSV ("Nameofthedatafile - CSV") : Sorting A Data Frame
No ratings yet
Mydata - Read - CSV ("Nameofthedatafile - CSV") : Sorting A Data Frame
2 pages
Lab1 411 Eman Yahya 7773225
No ratings yet
Lab1 411 Eman Yahya 7773225
16 pages
Presentation 1
No ratings yet
Presentation 1
34 pages
R Programming Basics for Beginners
No ratings yet
R Programming Basics for Beginners
14 pages
Lecture 9: Data Wrangling With Dplyr: Kevin Lee
No ratings yet
Lecture 9: Data Wrangling With Dplyr: Kevin Lee
12 pages
Data - Analysis Using Matlab
No ratings yet
Data - Analysis Using Matlab
156 pages
R Data Frame: Structure & Usage Guide
No ratings yet
R Data Frame: Structure & Usage Guide
14 pages
R Basic and Advanced
No ratings yet
R Basic and Advanced
9 pages
Exploring the ToothGrowth Dataset in R
No ratings yet
Exploring the ToothGrowth Dataset in R
44 pages
DSAMAN1
No ratings yet
DSAMAN1
16 pages
R Topicscovered
No ratings yet
R Topicscovered
22 pages
Lab 1 - Basic Functions in R and Plotting
No ratings yet
Lab 1 - Basic Functions in R and Plotting
8 pages
Daur Unit 2
No ratings yet
Daur Unit 2
28 pages
Sample Answers - Global Perspectives Practice Paper 2
No ratings yet
Sample Answers - Global Perspectives Practice Paper 2
2 pages
English Lower Secondary Checkpoint Summarizing Articles
100% (2)
English Lower Secondary Checkpoint Summarizing Articles
11 pages
ICT Grade 8 Practical P2 Revision
No ratings yet
ICT Grade 8 Practical P2 Revision
2 pages
English Paper 2 QP and Insert
No ratings yet
English Paper 2 QP and Insert
10 pages
English Paper 1 QP and Insert
No ratings yet
English Paper 1 QP and Insert
10 pages
DS-R Block 3 5 & 10 Marks Question Bank
No ratings yet
DS-R Block 3 5 & 10 Marks Question Bank
1 page
DSR Block 2 All
No ratings yet
DSR Block 2 All
95 pages
DS-R Block 4 5 & 10 Marks Question Bank
No ratings yet
DS-R Block 4 5 & 10 Marks Question Bank
1 page
Terms That Are Familiar With The Concept of Magnets
No ratings yet
Terms That Are Familiar With The Concept of Magnets
4 pages
How Electromagnets Work: A Simple Guide
No ratings yet
How Electromagnets Work: A Simple Guide
1 page
Package of Practices For Vegetables
No ratings yet
Package of Practices For Vegetables
232 pages
Priority OB1
No ratings yet
Priority OB1
2 pages
Cambridge International AS & A Level: Computer Science 9618/11
No ratings yet
Cambridge International AS & A Level: Computer Science 9618/11
10 pages
Indefinite Integration in Engineering Math
No ratings yet
Indefinite Integration in Engineering Math
7 pages
Download
No ratings yet
Download
18 pages
Road Maps Organic Chemistry Set 2 Eklavya @JEEAdvanced - 2024 (2) (4 Files Merged)
No ratings yet
Road Maps Organic Chemistry Set 2 Eklavya @JEEAdvanced - 2024 (2) (4 Files Merged)
11 pages
History of Sulfuric Acid
No ratings yet
History of Sulfuric Acid
2 pages
Job Satisfaction Analysis at PT Heartwarmer
No ratings yet
Job Satisfaction Analysis at PT Heartwarmer
10 pages
AQA 2020 Paper 1 MS
100% (2)
AQA 2020 Paper 1 MS
17 pages
Department of Computer Science & Engineering: Presentation On "Number Guessing Game"
No ratings yet
Department of Computer Science & Engineering: Presentation On "Number Guessing Game"
9 pages
Introduction to Object Databases
No ratings yet
Introduction to Object Databases
3 pages
Final Detailed Fermenter Design
100% (1)
Final Detailed Fermenter Design
24 pages
Understanding Process Sigma Level
No ratings yet
Understanding Process Sigma Level
11 pages
Enchaquetamiento Tuberias
No ratings yet
Enchaquetamiento Tuberias
41 pages
Understanding Camera Lenses in Photography
No ratings yet
Understanding Camera Lenses in Photography
7 pages
Book B
No ratings yet
Book B
47 pages
F3 Science (Physics Part)
No ratings yet
F3 Science (Physics Part)
20 pages
Basic Electronics: Dr. Prasanta Kumar Guha
No ratings yet
Basic Electronics: Dr. Prasanta Kumar Guha
39 pages
Carbohydrate Fermentation
No ratings yet
Carbohydrate Fermentation
23 pages
Advances in Steam Turbines For Modern Power Plants Tadashi Tanuma - Download The Full Set of Chapters Carefully Compiled
100% (3)
Advances in Steam Turbines For Modern Power Plants Tadashi Tanuma - Download The Full Set of Chapters Carefully Compiled
57 pages
Master of Commerce: Tamil Nadu Open University
100% (1)
Master of Commerce: Tamil Nadu Open University
59 pages
Professional Refrigeration Products
No ratings yet
Professional Refrigeration Products
4 pages
A Level GCSE Physics
No ratings yet
A Level GCSE Physics
5 pages
Eee 409 Note
No ratings yet
Eee 409 Note
65 pages
Ultrasonic Thickness Gaging
No ratings yet
Ultrasonic Thickness Gaging
4 pages
Excel® Worksheet For Estimating F, F and X For Use in The IEC Noise Worksheets
No ratings yet
Excel® Worksheet For Estimating F, F and X For Use in The IEC Noise Worksheets
5 pages
Experiment - Buckling of Strut
No ratings yet
Experiment - Buckling of Strut
2 pages
D. Chaum Et Al. (Eds.), Advances in Cryptology © Springer Science+Business Media New York 1983
No ratings yet
D. Chaum Et Al. (Eds.), Advances in Cryptology © Springer Science+Business Media New York 1983
2 pages
Citoquininas Foloración Pitahaya
No ratings yet
Citoquininas Foloración Pitahaya
14 pages
STATCOM - Working Principle, Design and Application - Electrical Concepts
No ratings yet
STATCOM - Working Principle, Design and Application - Electrical Concepts
7 pages

DS-R Block 3-1 All

Uploaded by

DS-R Block 3-1 All

Uploaded by

Data Science using R

© Kalasalingam Academy of Research and Education

• Performance: dplyr is optimized for performance, especially with large datasets,

# Create a sample data frame

Syntax summarise(data, summary_statistic = function(column))

# Calculate the average salary

Syntax arrange(data, column)

# Arrange the data frame by Salary in descending order

# Using pipe to chain operations

Data manipulation in R with dplyr

© Kalasalingam Academy of Research and Education

# Install dplyr if you haven't already

# Load the dplyr package

# Create a sample data frame

1. Filtering Rows with filter()

The filter() function is used to subset rows based on specific conditions.

# Filter rows where Age is greater than 30

# Select the Name and Salary columns

# Add a new column for Annual Salary

The summarise() function is used to calculate summary statistics.

# Calculate the average salary

6. Arranging Rows with arrange()

7. Chaining Operations with the Pipe Operator (%>%)

# Chain operations to filter, select, and mutate data

# Create another data frame for joining

Selecting, Mutating, Filtering, Arranging

© Kalasalingam Academy of Research and Education

We will use a sample data frame for demonstration purposes:

# Create a sample data frame

# Combine filtering, selecting, and mutating in one chain

© Kalasalingam Academy of Research and Education

Benefits of Using the Pipe Operator

data %>% function1(arguments) %>% function2(arguments) %>%

• data is the initial dataset.

Example of the Pipe Operator

# Using the pipe operator for data manipulation

Data blending and joining

© Kalasalingam Academy of Research and Education

Data blending and joining in R involves combining multiple datasets into a

In R, the dplyr package provides a set of powerful functions for joining

# Install dplyr if you haven't already

# Load the dplyr package

Outliers and Missing Values Treatment

© Kalasalingam Academy of Research and Education

Missing Values Treatment

2. Treating Missing Values

# Use mice to impute missing value

You might also like