0% found this document useful (0 votes)

42 views29 pages

Uber Analysis Python Project in R

Uploaded by

Kashif Majeed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views29 pages

Uber Analysis Python Project in R

Uploaded by

Kashif Majeed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 29

Project in R – Uber Data Analysis

Project
https://data-flair.training/blogs/r-
data-science-project-uber-data-
analysis/
Welcome to part 2 of R and Data Science Projects designed by DataFlair. In our series
of R projects, we are trying to use all the concepts related to Machine learning, AI and
Data Science.

We recommend you to follow all the steps given in the projects so that you will
master the technology rapidly. In today’s R project, we will analyze the Uber Pickups
in New York City dataset. This is more of a data visualization project that will guide
you towards using the ggplot2 library for understanding the data and for developing
an intuition for understanding the customers who avail the trips. So, before we start,
take a quick revision to data visualization concepts.

R Data Science Project – Uber Data

Analysis
Talking about our Uber data analysis project, data storytelling is an important
component of Machine Learning through which companies are able to understand the
background of various operations. With the help of visualization, companies can avail
the benefit of understanding the complex data and gain insights that would help them
to craft decisions. You will learn how to implement the ggplot2 on the Uber Pickups
dataset and at the end, master the art of data visualization in R.

You can download the dataset utilized in this project here – Uber Dataset

1. Importing the Essential Packages

In the first step of our R project, we will import the essential packages that we will use
in this uber data analysis project. Some of the important libraries of R that we will
use are –

 ggplot2

This is the backbone of this project. ggplot2 is the most popular data visualization
library that is most widely used for creating aesthetic visualization plots.

 ggthemes

This is more of an add-on to our main ggplot2 library. With this, we can create better
create extra themes and scales with the mainstream ggplot2 package.

 lubridate

Our dataset involves various time-frames. In order to understand our data in separate
time categories, we will make use of the lubridate package.

 dplyr

This package is the lingua franca of data manipulation in R.

 tidyr

This package will help you to tidy your data. The basic principle of tidyr is to tidy the
columns where each variable is present in a column, each observation is represented
by a row and each value depicts a cell.

 DT

With the help of this package, we will be able to interface with the JavaScript Library
called – Datatables.

 scales

With the help of graphical scales, we can automatically map the data to the correct
scales with well-placed axes and legends.

library(ggplot2)
library(ggthemes)
library(lubridate)
library(dplyr)
library(tidyr)
library(DT)
library(scales)

Input Screenshot 1:

Input Screenshot 2:

Input Screenshot 3:
2. Creating vector of colors to be implemented
in our plots
In this step of data science project, we will create a vector of our colors that will be
included in our plotting functions. You can also select your own set of colors.

Code:

colors = c(""#CC1011", "#665555", "#05a399", "#cfcaca", "#f5e840", "#0683c9",

"#e075b0"")

Input Screenshot 4:
3. Reading the Data into their designated
variables
Now, we will read several csv files that contain the data from April 2014 to
September 2014. We will store these in corresponding data frames like apr_data,
may_data, etc. After we have read the files, we will combine all of this data into a
single dataframe called ‘data_2014’.

To master this R Uber data analysis project, you need to know everything related to
data frames in R

Then, in the next step, we will perform the appropriate formatting of Date.Time
column. Then, we will proceed to create factors of time objects like day, month, year
etc.

Code:

apr_data <- read.csv("uber-raw-data-apr14.csv")

may_data <- read.csv("uber-raw-data-may14.csv")
jun_data <- read.csv("uber-raw-data-jun14.csv")
jul_data <- read.csv("uber-raw-data-jul14.csv")
aug_data <- read.csv("uber-raw-data-aug14.csv")
sep_data <- read.csv("uber-raw-data-sep14.csv")
data_2014 <- rbind(apr_data,may_data, jun_data, jul_data, aug_data, sep_data)
data_2014$Date.Time <- as.POSIXct(data_2014$Date.Time, format = "%m/%d/%Y
%H:%M:%S")
data_2014$Time <- format(as.POSIXct(data_2014$Date.Time, format =
"%m/%d/%Y %H:%M:%S"), format="%H:%M:%S")
data_2014$Date.Time <- ymd_hms(data_2014$Date.Time)
data_2014$day <- factor(day(data_2014$Date.Time))
data_2014$month <- factor(month(data_2014$Date.Time, label = TRUE))
data_2014$year <- factor(year(data_2014$Date.Time))
data_2014$dayofweek <- factor(wday(data_2014$Date.Time, label = TRUE))

Input Screenshot 5:
Code:

data_2014$hour <- factor(hour(hms(data_2014$Time)))

data_2014$minute <- factor(minute(hms(data_2014$Time)))
data_2014$second <- factor(second(hms(data_2014$Time)))

Input Screenshot 6:

Plotting the trips by the hours in a day

In the next step or R project, we will use the ggplot function to plot the number of
trips that the passengers had made in a day. We will also use dplyr to aggregate our
data. In the resulting visualizations, we can understand how the number of passengers
fares throughout the day. We observe that the number of trips are higher in the
evening around 5:00 and 6:00 PM.

hour_data <- data_2014 %>%

group_by(hour) %>%
dplyr::summarize(Total = n())
datatable(hour_data)

Input Screenshot 7:

Output Screenshot:

Code:
ggplot(hour_data, aes(hour, Total)) +
geom_bar( stat = "identity", fill = "steelblue", color = "red") +
ggtitle("Trips Every Hour") +
theme(legend.position = "none") +
scale_y_continuous(labels = comma)
month_hour <- data_2014 %>%
group_by(month, hour) %>%
dplyr::summarize(Total = n())
ggplot(month_hour, aes(hour, Total, fill = month)) +
geom_bar( stat = "identity") +
ggtitle("Trips by Hour and Month") +
scale_y_continuous(labels = comma)

Input Screenshot 8:

Input Screenshot 9:

Output:
Output:
Plotting data by trips during every day of the
month
In this section of DataFlair R project, we will learn how to plot our data based on
every day of the month. We observe from the resulting visualization that 30th of the
month had the highest trips in the year which is mostly contributed by the month of
April.

Code:

day_group <- data_2014 %>%

group_by(day) %>%
dplyr::summarize(Total = n())
datatable(day_group)

Output Screenshot:
Code:

ggplot(day_group, aes(day, Total)) +

geom_bar( stat = "identity", fill = "steelblue") +
ggtitle("Trips Every Day") +
theme(legend.position = "none") +
scale_y_continuous(labels = comma)

Input Screenshot 10:

Output:
Code:

day_month_group <- data_2014 %>%

group_by(month, day) %>%
dplyr::summarize(Total = n())
ggplot(day_month_group, aes(day, Total, fill = month)) +
geom_bar( stat = "identity") +
ggtitle("Trips by Day and Month") +
scale_y_continuous(labels = comma) +
scale_fill_manual(values = colors)

Input Screenshot 11:

Output:

Number of Trips taking place during months

in a year
In this section, we will visualize the number of trips that are taking place each month
of the year. In the output visualization, we observe that most trips were made during
the month of September. Furthermore, we also obtain visual reports of the number of
trips that were made on every day of the week.

Code:

month_group <- data_2014 %>%

group_by(month) %>%
dplyr::summarize(Total = n())
datatable(month_group)

Output Screenshot:

Code:

ggplot( , aes(month, Total, fill = month)) +

geom_bar( stat = "identity") +
ggtitle("Trips by Month") +
theme(legend.position = "none") +
scale_y_continuous(labels = comma) +
scale_fill_manual(values = colors)

Input Screenshot 12:

Output:

month_weekday <- data_2014 %>%

group_by(month, dayofweek) %>%
dplyr::summarize(Total = n())
ggplot(month_weekday, aes(month, Total, fill = dayofweek)) +
geom_bar( stat = "identity", position = "dodge") +
ggtitle("Trips by Day and Month") +
scale_y_continuous(labels = comma) +
scale_fill_manual(values = colors)

Input Screenshot 13:

Output:

Finding out the number of Trips by bases

In the following visualization, we plot the number of trips that have been taken by the
passengers from each of the bases. There are five bases in all out of which, we
observe that B02617 had the highest number of trips. Furthermore, this base had the
highest number of trips in the month B02617. Thursday observed highest trips in the
three bases – B02598, B02617, B02682.

Code:

ggplot(data_2014, aes(Base)) +
geom_bar(fill = "darkred") +
scale_y_continuous(labels = comma) +
ggtitle("Trips by Bases")

Input Screenshot 14:

Output:
Code:

ggplot(data_2014, aes(Base, fill = month)) +

geom_bar(position = "dodge") +
scale_y_continuous(labels = comma) +
ggtitle("Trips by Bases and Month") +
scale_fill_manual(values = colors)

Input Screenshot 15:

Output:

Code:
ggplot(data_2014, aes(Base, fill = dayofweek)) +
geom_bar(position = "dodge") +
scale_y_continuous(labels = comma) +
ggtitle("Trips by Bases and DayofWeek") +
scale_fill_manual(values = colors)

Output:

Creating a Heatmap visualization of day, hour

and month
In this section, we will learn how to plot heatmaps using ggplot(). We will plot five
heatmap plots –

 First, we will plot Heatmap by Hour and Day.

 Second, we will plot Heatmap by Month and Day.
 Third, a Heatmap by Month and Day of the Week.
 Fourth, a Heatmap that delineates Month and Bases.
 Finally, we will plot the heatmap, by bases and day of the week.

Code:

day_and_hour <- data_2014 %>%

group_by(day, hour) %>%
dplyr::summarize(Total = n())
datatable(day_and_hour)

Input Screenshot 16:

Output Screenshot:
Code:

ggplot(day_and_hour, aes(day, hour, fill = Total)) +

geom_tile(color = "white") +
ggtitle("Heat Map by Hour and Day")

Input Screenshot 17:

Output:

Code:

ggplot(day_month_group, aes(day, month, fill = Total)) +

geom_tile(color = "white") +
ggtitle("Heat Map by Month and Day")
Input Screenshot 18:

Output:

Code:

ggplot(month_weekday, aes(dayofweek, month, fill = Total)) +

geom_tile(color = "white") +
ggtitle("Heat Map by Month and Day of Week")

Input Screenshot 19:

Output:

Code:

month_base <- data_2014 %>%

group_by(Base, month) %>%
dplyr::summarize(Total = n())
day0fweek_bases <- data_2014 %>%
group_by(Base, dayofweek) %>%
dplyr::summarize(Total = n())
ggplot(month_base, aes(Base, month, fill = Total)) +
geom_tile(color = "white") +
ggtitle("Heat Map by Month and Bases")

Input Screenshot 20:

Output:

Code:

ggplot(day0fweek_bases, aes(Base, dayofweek, fill = Total)) +

geom_tile(color = "white") +
ggtitle("Heat Map by Bases and Day of Week")
Input Screenshot 21:

Output:

Creating a map visualization of rides in New

York
In the final section, we will visualize the rides in New York city by creating a geo-plot
that will help us to visualize the rides during 2014 (Apr – Sep) and by the bases in the
same period.

Code:
min_lat <- 40.5774
max_lat <- 40.9176
min_long <- -74.15
max_long <- -73.7004
ggplot(data_2014, aes(x=Lon, y=Lat)) +
geom_point(size=1, color = "blue") +
scale_x_continuous(limits=c(min_long, max_long)) +
scale_y_continuous(limits=c(min_lat, max_lat)) +
theme_map() +
ggtitle("NYC MAP BASED ON UBER RIDES DURING 2014 (APR-SEP)")
ggplot(data_2014, aes(x=Lon, y=Lat, color = Base)) +
geom_point(size=1) +
scale_x_continuous(limits=c(min_long, max_long)) +
scale_y_continuous(limits=c(min_lat, max_lat)) +
theme_map() +
ggtitle("NYC MAP BASED ON UBER RIDES DURING 2014 (APR-SEP) by
BASE")

Input Screenshot 22:

Output:
Uber data analysis using R

Output:
Summary
At the end of the Uber data analysis R project, we observed how to create data
visualizations. We made use of packages like ggplot2 that allowed us to plot various
types of visualizations that pertained to several time-frames of the year. With this, we
could conclude how time affected customer trips. Finally, we made a geo plot of New
York that provided us with the details of how various users made trips from different
bases.

Hope you enjoyed the above R Data Science Project. Keep visiting DataFlair for
more interesting projects related to the latest technologies like Big Data, R and Data
Science. If you face any issue while practicing the same, comment us below. We will
definitely help.

Master R technology for Free – Check R Tutorials Series

If you are Happy with DataFlair, do not forget to make us happy with your
positive feedback on Google

MML Chinmay
No ratings yet
MML Chinmay
10 pages
Uber Data Analysis in R Guide
No ratings yet
Uber Data Analysis in R Guide
15 pages
DMDS Mini Project Final
No ratings yet
DMDS Mini Project Final
15 pages
Report of BDA Mini Project
No ratings yet
Report of BDA Mini Project
11 pages
Bda Report1
No ratings yet
Bda Report1
17 pages
1-Week R Programming Syllabus (Data Science, ML, Time Series)
No ratings yet
1-Week R Programming Syllabus (Data Science, ML, Time Series)
6 pages
Dav Exp8
No ratings yet
Dav Exp8
10 pages
DAV Practical 7
No ratings yet
DAV Practical 7
3 pages
R Programming
No ratings yet
R Programming
11 pages
Lab1 411 Eman Yahya 7773225
No ratings yet
Lab1 411 Eman Yahya 7773225
16 pages
Uber Rides Data Analysis Overview
No ratings yet
Uber Rides Data Analysis Overview
16 pages
Week13 Slides Review
No ratings yet
Week13 Slides Review
23 pages
N N N N N N: A Ovel Approach To A Alyze Uber Datausi G Machi E Lear I G
No ratings yet
N N N N N N: A Ovel Approach To A Alyze Uber Datausi G Machi E Lear I G
17 pages
Advanced R Data Analysis Training PDF
No ratings yet
Advanced R Data Analysis Training PDF
72 pages
Lesson2 GGPlot
No ratings yet
Lesson2 GGPlot
3 pages
Advanced R Data Visualization Guide
No ratings yet
Advanced R Data Visualization Guide
22 pages
MR4103 - Week 6a
No ratings yet
MR4103 - Week 6a
21 pages
Data Science & Visualization Courses
No ratings yet
Data Science & Visualization Courses
26 pages
Data Viz with ggplot2 for Analysts
No ratings yet
Data Viz with ggplot2 for Analysts
30 pages
Eda Lab Manual
No ratings yet
Eda Lab Manual
34 pages
Lesson3 Sandbox - RMD
No ratings yet
Lesson3 Sandbox - RMD
4 pages
R
No ratings yet
R
14 pages
Praktikum Modul 3
No ratings yet
Praktikum Modul 3
5 pages
How To Convert Casuals To Members?": Google Data Analytics Course Capstone Project: Case Study 1 "Cyclistic"
No ratings yet
How To Convert Casuals To Members?": Google Data Analytics Course Capstone Project: Case Study 1 "Cyclistic"
18 pages
Online Payment Fraud Detection ML
No ratings yet
Online Payment Fraud Detection ML
40 pages
Dev Record Aids
No ratings yet
Dev Record Aids
24 pages
Uber - Analysis - Jupyter - Notebook
100% (1)
Uber - Analysis - Jupyter - Notebook
10 pages
Mastering Data Analysis With R - Sample Chapter
No ratings yet
Mastering Data Analysis With R - Sample Chapter
32 pages
Data Science Using R 2
No ratings yet
Data Science Using R 2
29 pages
Rust
No ratings yet
Rust
24 pages
3 DataFrames GGPlot2
No ratings yet
3 DataFrames GGPlot2
28 pages
Data - Analysis Using Matlab
No ratings yet
Data - Analysis Using Matlab
156 pages
Matlab Mathworks Data Analysis
No ratings yet
Matlab Mathworks Data Analysis
167 pages
Data Analysis With R
No ratings yet
Data Analysis With R
72 pages
Module 2 ExploratoryDataAnalysis
No ratings yet
Module 2 ExploratoryDataAnalysis
22 pages
Intro To Data Science Lecture 4
No ratings yet
Intro To Data Science Lecture 4
13 pages
ULEOR 04 Tidyverse Handout
No ratings yet
ULEOR 04 Tidyverse Handout
59 pages
Essential Knowledge For R Beginners B0D7S9F661-2
No ratings yet
Essential Knowledge For R Beginners B0D7S9F661-2
225 pages
Eda Lab Manual
No ratings yet
Eda Lab Manual
40 pages
Lab 1
No ratings yet
Lab 1
1 page
Analysis Using Statistical: Introduction & Data Exploration
No ratings yet
Analysis Using Statistical: Introduction & Data Exploration
23 pages
MIT 201 - Tutorial 02
No ratings yet
MIT 201 - Tutorial 02
12 pages
Data Analytics Using R
No ratings yet
Data Analytics Using R
37 pages
R Studio: Scripts, Data Handling & Cleaning
No ratings yet
R Studio: Scripts, Data Handling & Cleaning
25 pages
Data Visualization - Spring 2017
No ratings yet
Data Visualization - Spring 2017
57 pages
Business Analytics Unit - IV Notes - 60637706 - 2025 - 05!15!02 - 16
No ratings yet
Business Analytics Unit - IV Notes - 60637706 - 2025 - 05!15!02 - 16
28 pages
Handout 3
No ratings yet
Handout 3
24 pages
Nutrition Calculator for Recipes
No ratings yet
Nutrition Calculator for Recipes
16 pages
NYC Taxi Data Analysis with R
No ratings yet
NYC Taxi Data Analysis with R
39 pages
Assignment (4) .Module RAmanVerma (22MBA10026)
No ratings yet
Assignment (4) .Module RAmanVerma (22MBA10026)
18 pages
Data Analysis with R for Beginners
No ratings yet
Data Analysis with R for Beginners
4 pages
Data Analytics Lesson 10 Notes
No ratings yet
Data Analytics Lesson 10 Notes
7 pages
Bda Skill
No ratings yet
Bda Skill
34 pages
R Programming for Data Science Basics
No ratings yet
R Programming for Data Science Basics
8 pages
Lectures Details
No ratings yet
Lectures Details
2 pages
Slidesgo Innovating Education Integrating Robotics Into Stem Curriculum 20240619200429HRlM
No ratings yet
Slidesgo Innovating Education Integrating Robotics Into Stem Curriculum 20240619200429HRlM
8 pages
Program Logic
No ratings yet
Program Logic
2 pages
Mininet Steps IMPP
No ratings yet
Mininet Steps IMPP
5 pages
Wifiphisher Impppp
No ratings yet
Wifiphisher Impppp
37 pages
Kashif Majeed
No ratings yet
Kashif Majeed
1 page
Evil Twin WiFi Attack
No ratings yet
Evil Twin WiFi Attack
46 pages
Using Samdump2
No ratings yet
Using Samdump2
3 pages
Lec 58
No ratings yet
Lec 58
23 pages
Resignation Letter22A
No ratings yet
Resignation Letter22A
1 page
Types of Computer
57% (7)
Types of Computer
20 pages
Lec 57
No ratings yet
Lec 57
12 pages
Different Types of Comput.9328178.powerpoint
No ratings yet
Different Types of Comput.9328178.powerpoint
5 pages
Kaushal Kumar: 03/2017 To Present Tech-Support Solutions
No ratings yet
Kaushal Kumar: 03/2017 To Present Tech-Support Solutions
3 pages
Chapter 5 - Functions
No ratings yet
Chapter 5 - Functions
9 pages
Fortiweb Trouble Shooting
No ratings yet
Fortiweb Trouble Shooting
239 pages
How To Repair Laptop Battery
100% (1)
How To Repair Laptop Battery
10 pages
Batch Data Communication (BDC) Procedure in Overview, PDF Book in SAP ABAP
No ratings yet
Batch Data Communication (BDC) Procedure in Overview, PDF Book in SAP ABAP
6 pages
Overview of Transaction Processing Systems
No ratings yet
Overview of Transaction Processing Systems
4 pages
Microcontroller Syllabus 20 EC2403
No ratings yet
Microcontroller Syllabus 20 EC2403
5 pages
Exploring Microsoft Office Powerpoint 2007: Enhancing With Illustrations
No ratings yet
Exploring Microsoft Office Powerpoint 2007: Enhancing With Illustrations
39 pages
Case Study Analysis (MGT 300)
No ratings yet
Case Study Analysis (MGT 300)
35 pages
Module 2 Assignment
No ratings yet
Module 2 Assignment
14 pages
Mini Interviews
No ratings yet
Mini Interviews
5 pages
Mini Project Report
No ratings yet
Mini Project Report
19 pages
Control Structures in C++
100% (2)
Control Structures in C++
41 pages
Mercury Security - EP1502 - Manual1
No ratings yet
Mercury Security - EP1502 - Manual1
7 pages
Compal La-1911 r1.0 Schematics PDF
No ratings yet
Compal La-1911 r1.0 Schematics PDF
59 pages
1769L33ER 22012 Datasheet
No ratings yet
1769L33ER 22012 Datasheet
2 pages
Bicotest t625 User Manual PDF
No ratings yet
Bicotest t625 User Manual PDF
3 pages
Ccna MCQ
No ratings yet
Ccna MCQ
5 pages
A1 - ICQ Agency Level Controls
100% (1)
A1 - ICQ Agency Level Controls
7 pages
Coursera - Week 1
100% (4)
Coursera - Week 1
4 pages
Key Concepts in Computer Graphics
No ratings yet
Key Concepts in Computer Graphics
4 pages
Yennis González Cadiz: IT Consultant Profile
No ratings yet
Yennis González Cadiz: IT Consultant Profile
3 pages
Front-End Developer Resume - Bharath Kumar
No ratings yet
Front-End Developer Resume - Bharath Kumar
6 pages
Competition Guide - HSBC Malaysia Business Case Competition 2022
No ratings yet
Competition Guide - HSBC Malaysia Business Case Competition 2022
12 pages
What Is Database
No ratings yet
What Is Database
24 pages
Modicon Easy M200 - TM200CE40T
No ratings yet
Modicon Easy M200 - TM200CE40T
11 pages
HTML Tutorial in Bangla
81% (31)
HTML Tutorial in Bangla
54 pages
Generative AI Tutorial For Beginners Course Content
No ratings yet
Generative AI Tutorial For Beginners Course Content
2 pages
Despacito423Q-V3 With Digitaltut Labs
No ratings yet
Despacito423Q-V3 With Digitaltut Labs
203 pages
Wireless Sensor Networks
No ratings yet
Wireless Sensor Networks
12 pages

Uber Analysis Python Project in R

Uploaded by

Uber Analysis Python Project in R

Uploaded by

Project in R – Uber Data Analysis

R Data Science Project – Uber Data

1. Importing the Essential Packages

This package is the lingua franca of data manipulation in R.

colors = c(""#CC1011", "#665555", "#05a399", "#cfcaca", "#f5e840", "#0683c9",

apr_data <- read.csv("uber-raw-data-apr14.csv")

data_2014$hour <- factor(hour(hms(data_2014$Time)))

Plotting the trips by the hours in a day

hour_data <- data_2014 %>%

day_group <- data_2014 %>%

ggplot(day_group, aes(day, Total)) +

Input Screenshot 10:

day_month_group <- data_2014 %>%

Input Screenshot 11:

Number of Trips taking place during months

month_group <- data_2014 %>%

ggplot( , aes(month, Total, fill = month)) +

Input Screenshot 12:

month_weekday <- data_2014 %>%

Input Screenshot 13:

Finding out the number of Trips by bases

Input Screenshot 14:

ggplot(data_2014, aes(Base, fill = month)) +

Input Screenshot 15:

Creating a Heatmap visualization of day, hour

 First, we will plot Heatmap by Hour and Day.

day_and_hour <- data_2014 %>%

Input Screenshot 16:

ggplot(day_and_hour, aes(day, hour, fill = Total)) +

Input Screenshot 17:

ggplot(day_month_group, aes(day, month, fill = Total)) +

ggplot(month_weekday, aes(dayofweek, month, fill = Total)) +

Input Screenshot 19:

month_base <- data_2014 %>%

Input Screenshot 20:

ggplot(day0fweek_bases, aes(Base, dayofweek, fill = Total)) +

Creating a map visualization of rides in New

Input Screenshot 22:

Master R technology for Free – Check R Tutorials Series

You might also like