0% found this document useful (0 votes)

517 views10 pages

Uber Trip Data Analysis

The document describes an analysis of Uber driver trip data. It loads necessary libraries and imports a dataset called "uber_drives" with 1155 rows and 7 columns. It then performs exploratory data analysis on the data, including showing the first and last 10 records, determining dimensions and missing values, and generating summaries. Key findings are that the "PURPOSE" column has 502 missing values while other columns are complete.

Uploaded by

shrijit “shri” tembhehar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

517 views10 pages

Uber Trip Data Analysis

Uploaded by

shrijit “shri” tembhehar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Uber Drive:

The project is based on the trips made by Uber drivers. Here, we are analyzing different aspects of the trips by doing Exploratory
Data Analysis

Load the necessary libraries. Import and load the dataset with a
name uber_drives .
In [3]:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

In [4]:
ud = pd.read_csv("uberdrives.csv")

In [5]:
ud.shape

Out[5]:
(1155, 7)

Q1. Show the last 10 records of the dataset. (2 point)

In [6]:

ud.tail(10)

Out[6]:

START_DATE* END_DATE* CATEGORY* START* STOP* MILES* PURPOSE*

1145 12/30/2016 10:15 12/30/2016 10:33 Business Karachi Karachi 2.8 Errand/Supplies

1146 12/30/2016 11:31 12/30/2016 11:56 Business Karachi Karachi 2.9 Errand/Supplies

1147 12/30/2016 15:41 12/30/2016 16:03 Business Karachi Karachi 4.6 Errand/Supplies

1148 12/30/2016 16:45 12/30/2016 17:08 Business Karachi Karachi 4.6 Meeting

1149 12/30/2016 23:06 12/30/2016 23:10 Business Karachi Karachi 0.8 Customer Visit

1150 12/31/2016 1:07 12/31/2016 1:14 Business Karachi Karachi 0.7 Meeting

Unknown
1151 12/31/2016 13:24 12/31/2016 13:42 Business Karachi 3.9 Temporary Site
Location

Unknown Unknown
1152 12/31/2016 15:03 12/31/2016 15:38 Business 16.2 Meeting
Location Location

1153 12/31/2016 21:32 12/31/2016 21:50 Business Katunayake Gampaha 6.4 Temporary Site

1154 12/31/2016 22:08 12/31/2016 23:51 Business Gampaha Ilukwatta 48.2 Temporary Site

Q2. Show the first 10 records of the dataset. (2 points)

In [7]:

ud.head(10)

Out[7]:
Out[7]:

START_DATE* END_DATE* CATEGORY* START* STOP* MILES* PURPOSE*

0 01-01-2016 21:11 01-01-2016 21:17 Business Fort Pierce Fort Pierce 5.1 Meal/Entertain

1 01-02-2016 01:25 01-02-2016 01:37 Business Fort Pierce Fort Pierce 5.0 NaN

2 01-02-2016 20:25 01-02-2016 20:38 Business Fort Pierce Fort Pierce 4.8 Errand/Supplies

3 01-05-2016 17:31 01-05-2016 17:45 Business Fort Pierce Fort Pierce 4.7 Meeting

West Palm
4 01-06-2016 14:42 01-06-2016 15:49 Business Fort Pierce 63.7 Customer Visit
Beach

West Palm West Palm

5 01-06-2016 17:15 01-06-2016 17:19 Business 4.3 Meal/Entertain
Beach Beach

West Palm
6 01-06-2016 17:30 01-06-2016 17:35 Business Palm Beach 7.1 Meeting
Beach

7 01-07-2016 13:27 01-07-2016 13:33 Business Cary Cary 0.8 Meeting

8 01-10-2016 08:05 01-10-2016 08:25 Business Cary Morrisville 8.3 Meeting

9 01-10-2016 12:17 01-10-2016 12:44 Business Jamaica New York 16.5 Customer Visit

Q3. Show the dimension(number of rows and columns) of the dataset. (2

points)
In [8]:
ud.shape

Out[8]:

(1155, 7)

In [9]:

print("The number of rows are ",ud.shape[0],"\nThe number of columns are",ud.shape[1])

The number of rows are 1155

The number of columns are 7

Q4. Show the size (Total number of elements) of the dataset. (2 points)
In [10]:
ud.size

Out[10]:
8085

Q5. Display the information about all the variables of the data set. What can
you infer from the output?(1 +2 points)
Hint: Information includes - Total number of columns,variable data-types, number of non-null values in a variable, and
usage

In [11]:

ud.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1155 entries, 0 to 1154
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 START_DATE* 1155 non-null object
0 START_DATE* 1155 non-null object
1 END_DATE* 1155 non-null object
2 CATEGORY* 1155 non-null object
3 START* 1155 non-null object
4 STOP* 1155 non-null object
5 MILES* 1155 non-null float64
6 PURPOSE* 653 non-null object
dtypes: float64(1), object(6)
memory usage: 63.3+ KB

There are 653 non-null values which means there are (1155-653)=502 null values in Variable Column "PURPOSE". Variable
Column"MILES" have Continous Variable data & all other Variable Columns have Categorical Variable data.

Q6. Check for missing values. (2 points)

Note: Output should contain only one boolean value

In [12]:
ud.isnull()

Out[12]:

START_DATE* END_DATE* CATEGORY* START* STOP* MILES* PURPOSE*

0 False False False False False False False

1 False False False False False False True

2 False False False False False False False

3 False False False False False False False

4 False False False False False False False

... ... ... ... ... ... ... ...

1150 False False False False False False False

1151 False False False False False False False

1152 False False False False False False False

1153 False False False False False False False

1154 False False False False False False False

1155 rows × 7 columns

In [13]:
ud.isnull().sum()

Out[13]:
START_DATE* 0
END_DATE* 0
CATEGORY* 0
START* 0
STOP* 0
MILES* 0
PURPOSE* 502
dtype: int64

There are missing values present in the "PURPOSE" Column variables of the dataset.

Q7. How many missing values are present in the entire dataset? (2 points)
In [174]:

ud.isnull().sum()

Out[174]:
Out[174]:
START_DATE 0
END_DATE 0
CATEGORY 0
START 0
STOP 0
MILES 0
PURPOSE 502
dtype: int64

The missing values present in entire data set is 502.

Q8. Get the summary of the original data. (2 points).

Hint: Summary includes- Count,Mean, Std, Min, 25%,50%,75% and max

In [99]:

ud.describe(include="all").T

Out[99]:

count unique top freq mean std min 25% 50% 75% max

START_DATE 1155 1154 6/28/2016 23:34 2 NaN NaN NaN NaN NaN NaN NaN

END_DATE 1155 1154 6/28/2016 23:59 2 NaN NaN NaN NaN NaN NaN NaN

CATEGORY 1155 2 Business 1078 NaN NaN NaN NaN NaN NaN NaN

START 1155 176 Cary 201 NaN NaN NaN NaN NaN NaN NaN

STOP 1155 187 Cary 203 NaN NaN NaN NaN NaN NaN NaN

MILES 1155 NaN NaN NaN 10.5668 21.5791 0.5 2.9 6 10.4 310.3

PURPOSE 653 10 Meeting 187 NaN NaN NaN NaN NaN NaN NaN

Q9. Drop the missing values and store the data in a new dataframe (name
it"df") (2-points)
Note: Dataframe "df" will not contain any missing value

In [178]:
df = ud.dropna()
df

Out[178]:

START_DATE END_DATE CATEGORY START STOP MILES PURPOSE

0 01-01-2016 21:11 01-01-2016 21:17 Business Fort Pierce Fort Pierce 5.1 Meal/Entertain

2 01-02-2016 20:25 01-02-2016 20:38 Business Fort Pierce Fort Pierce 4.8 Errand/Supplies

3 01-05-2016 17:31 01-05-2016 17:45 Business Fort Pierce Fort Pierce 4.7 Meeting

4 01-06-2016 14:42 01-06-2016 15:49 Business Fort Pierce West Palm Beach 63.7 Customer Visit

5 01-06-2016 17:15 01-06-2016 17:19 Business West Palm Beach West Palm Beach 4.3 Meal/Entertain

... ... ... ... ... ... ... ...

1150 12/31/2016 1:07 12/31/2016 1:14 Business Karachi Karachi 0.7 Meeting

Unknown
1151 12/31/2016 13:24 12/31/2016 13:42 Business Karachi 3.9 Temporary Site
Location

Unknown Unknown
1152 12/31/2016 15:03 12/31/2016 15:38 Business 16.2 Meeting
Location Location

1153 12/31/2016 21:32 12/31/2016 21:50 Business Katunayake Gampaha 6.4 Temporary Site

1154 12/31/2016 22:08 12/31/2016 23:51 Business Gampaha Ilukwatta 48.2 Temporary Site
653 rows × 7 columns

Q10. Check the information of the dataframe(df). (1 points)

Hint: Information includes - Total number of columns,variable data-types, number of non-null values in a variable, and
usage

In [73]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 653 entries, 0 to 1154
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 START_DATE* 653 non-null object
1 END_DATE* 653 non-null object
2 CATEGORY* 653 non-null object
3 START* 653 non-null object
4 STOP* 653 non-null object
5 MILES* 653 non-null float64
6 PURPOSE* 653 non-null object
dtypes: float64(1), object(6)
memory usage: 40.8+ KB

Q11. Get the unique start locations. (2 points)

Note: This question is based on the dataframe with no 'NA' values

In [106]:
df.columns = df.columns.str.replace('*','', regex = True)

In [107]:
df.head()

Out[107]:

START_DATE END_DATE CATEGORY START STOP MILES PURPOSE

0 01-01-2016 21:11 01-01-2016 21:17 Business Fort Pierce Fort Pierce 5.1 Meal/Entertain

2 01-02-2016 20:25 01-02-2016 20:38 Business Fort Pierce Fort Pierce 4.8 Errand/Supplies

3 01-05-2016 17:31 01-05-2016 17:45 Business Fort Pierce Fort Pierce 4.7 Meeting

West Palm
4 01-06-2016 14:42 01-06-2016 15:49 Business Fort Pierce 63.7 Customer Visit
Beach

West Palm West Palm

5 01-06-2016 17:15 01-06-2016 17:19 Business 4.3 Meal/Entertain
Beach Beach

In [108]:
df.START.unique()

Out[108]:
array(['Fort Pierce', 'West Palm Beach', 'Cary', 'Jamaica', 'New York',
'Elmhurst', 'Midtown', 'East Harlem', 'Flatiron District',
'Midtown East', 'Hudson Square', 'Lower Manhattan',
"Hell's Kitchen", 'Downtown', 'Gulfton', 'Houston', 'Eagan Park',
'Morrisville', 'Durham', 'Farmington Woods', 'Lake Wellingborough',
'Fayetteville Street', 'Raleigh', 'Whitebridge', 'Hazelwood',
'Fairmont', 'Meredith Townes', 'Apex', 'Chapel Hill', 'Northwoods',
'Edgehill Farms', 'Eastgate', 'East Elmhurst', 'Long Island City',
'Katunayaka', 'Colombo', 'Nugegoda', 'Unknown Location',
'Katunayaka', 'Colombo', 'Nugegoda', 'Unknown Location',
'Islamabad', 'R?walpindi', 'Noorpur Shahan', 'Preston',
'Heritage Pines', 'Tanglewood', 'Waverly Place', 'Wayne Ridge',
'Westpark Place', 'East Austin', 'The Drag', 'South Congress',
'Georgian Acres', 'North Austin', 'West University', 'Austin',
'Katy', 'Sharpstown', 'Sugar Land', 'Galveston', 'Port Bolivar',
'Washington Avenue', 'Briar Meadow', 'Latta', 'Jacksonville',
'Lake Reams', 'Orlando', 'Kissimmee', 'Daytona Beach', 'Ridgeland',
'Florence', 'Meredith', 'Holly Springs', 'Chessington', 'Burtrose',
'Parkway', 'Mcvan', 'Capitol One', 'University District',
'Seattle', 'Redmond', 'Bellevue', 'San Francisco', 'Palo Alto',
'Sunnyvale', 'Newark', 'Menlo Park', 'Old City', 'Savon Height',
'Kilarney Woods', 'Townes at Everett Crossing', 'Huntington Woods',
'Weston', 'Seaport', 'Medical Centre', 'Rose Hill', 'Soho',
'Tribeca', 'Financial District', 'Oakland', 'Emeryville',
'Berkeley', 'Kenner', 'CBD', 'Lower Garden District', 'Storyville',
'New Orleans', 'Chalmette', 'Arabi', 'Pontchartrain Shores',
'Metairie', 'Summerwinds', 'Parkwood', 'Banner Elk', 'Boone',
'Stonewater', 'Lexington Park at Amberly', 'Winston Salem',
'Asheville', 'Topton', 'Renaissance', 'Santa Clara', 'Ingleside',
'West Berkeley', 'Mountain View', 'El Cerrito', 'Krendle Woods',
'Fuquay-Varina', 'Rawalpindi', 'Lahore', 'Karachi', 'Katunayake',
'Gampaha'], dtype=object)

In [109]:

df.START.nunique()

Out[109]:

131

Q12. What is the total number of unique start locations? (2 points)

Note: Use the original dataframe without dropping 'NA' values

In [110]:

ud.columns = ud.columns.str.replace('*','', regex =True)

In [111]:
ud.head()

Out[111]:

START_DATE END_DATE CATEGORY START STOP MILES PURPOSE

0 01-01-2016 21:11 01-01-2016 21:17 Business Fort Pierce Fort Pierce 5.1 Meal/Entertain

1 01-02-2016 01:25 01-02-2016 01:37 Business Fort Pierce Fort Pierce 5.0 NaN

2 01-02-2016 20:25 01-02-2016 20:38 Business Fort Pierce Fort Pierce 4.8 Errand/Supplies

3 01-05-2016 17:31 01-05-2016 17:45 Business Fort Pierce Fort Pierce 4.7 Meeting

West Palm
4 01-06-2016 14:42 01-06-2016 15:49 Business Fort Pierce 63.7 Customer Visit
Beach

In [112]:
ud.START.nunique()

Out[112]:
176

Q13. What is the total number of unique stop locations. (2 points)

Note: Use the original dataframe without dropping 'NA' values.
Note: Use the original dataframe without dropping 'NA' values.

In [113]:
ud.STOP.nunique()

Out[113]:
187

Q14. Display all Uber trips that has the starting point as San Francisco. (2
points)
Note: Use the original dataframe without dropping the 'NA' values.

In [114]:

ud[ud['START'] == 'San Francisco']

Out[114]:

START_DATE END_DATE CATEGORY START STOP MILES PURPOSE

Between
362 05-09-2016 14:39 05-09-2016 15:06 Business San Francisco Palo Alto 20.5
Offices

440 6/14/2016 16:09 6/14/2016 16:39 Business San Francisco Emeryville 11.6 Meeting

836 10/19/2016 14:02 10/19/2016 14:31 Business San Francisco Berkeley 10.8 NaN

Between
917 11-07-2016 19:17 11-07-2016 19:57 Business San Francisco Berkeley 13.2
Offices

919 11-08-2016 12:16 11-08-2016 12:49 Business San Francisco Berkeley 11.3 Meeting

927 11-09-2016 18:40 11-09-2016 19:17 Business San Francisco Oakland 12.7 Customer Visit

933 11-10-2016 15:17 11-10-2016 15:22 Business San Francisco Oakland 9.9 Temporary Site

966 11/15/2016 20:44 11/15/2016 21:00 Business San Francisco Berkeley 11.8 Temporary Site

Q15. What is the most popular starting point for the Uber drivers? (2
points)
Note: Use the original dataframe without dropping the 'NA' values.

Hint:Popular means the place that is visited the most

In [185]:
ud.START.value_counts().head(1)

Out[185]:
Cary 201
Name: START, dtype: int64

'Cary' is the most popular starting point for the Uber drivers.

Q16. What is the most popular dropping point for the Uber drivers? (2
points)
Note: Use the original dataframe without dropping the 'NA' values.

Hint: Popular means the place that is visited the most

In [186]:
ud.STOP.value_counts().head(1)

Out[186]:

Cary 203
Name: STOP, dtype: int64

'Cary' is the most popular dropping point for the Uber drivers.

Q17. What is the most frequent route taken by Uber drivers. (3 points)
Note: This question is based on the new dataframe with no 'na' values.

Hint-Print the most frequent route taken by Uber drivers (Route= combination of START & END points present in the Data
set).

In [255]:
sta = df["START"].sort_values()
stp = df["STOP"].sort_values()
tot = sta + stp
tot.value_counts().head(5)

Out[255]:

CaryMorrisville 52
MorrisvilleCary 51
CaryCary 44
Unknown LocationUnknown Location 30
CaryDurham 30
dtype: int64

In [279]:

#df.groupby('START')['STOP'].value_counts(ascending=False).max()

Out[279]:

Q18. Display all types of purposes for the trip in an array. (2 points)
Note: This question is based on the new dataframe with no 'NA' values.

In [139]:

df.PURPOSE.unique()

Out[139]:

array(['Meal/Entertain', 'Errand/Supplies', 'Meeting', 'Customer Visit',

'Temporary Site', 'Between Offices', 'Charity ($)', 'Commute',
'Moving', 'Airport/Travel'], dtype=object)

Q19. Plot a bar graph of Purpose vs Miles(Distance). What can you infer from
the plot(2 +2 points)
Note: Use the original dataframe without dropping the 'NA' values.

Hint:You have to plot total/sum miles per purpose

In [147]:
plt.figure(figsize = (12,5))
plt.figure(figsize = (12,5))
sns.barplot(ud['PURPOSE'],ud['MILES'])
plt.xticks(rotation=45)
plt.show()

In [ ]:

Maximum distance travelled by Uber is to commute.

Q20. Display a dataframe of Purpose and the total distance travelled for that
particular Purpose. (3 points)
Note: Use the original dataframe without dropping "NA" values

In [193]:
ud[['PURPOSE', 'MILES']]

Out[193]:

PURPOSE MILES

0 Meal/Entertain 5.1

1 None 5.0

2 Errand/Supplies 4.8

3 Meeting 4.7

4 Customer Visit 63.7

... ... ...

1150 Meeting 0.7

1151 Temporary Site 3.9

1152 Meeting 16.2

1153 Temporary Site 6.4

1154 Temporary Site 48.2

1155 rows × 2 columns

Q21. Generate a plot showing count of trips vs category of trips. What can you
infer from the plot (2 +1 points)
Note: Use the original dataframe without dropping the 'NA' values.
In [259]:

sns.countplot(ud['CATEGORY'])
plt.xticks(rotation = 45)
plt.show()

The maximum count of trips is under Business Category.(Maximum usage of Uber is for Business Purpose and rather than for
Personal Purpose)

Q22. What percentage of Miles were clocked under Business Category and
what percentage of Miles were clocked under Personal Category ? (3
points)

Note:Use the original dataframe without dropping the 'NA' values.

In [284]:
ud.groupby('CATEGORY')['MILES'].sum()

Out[284]:

CATEGORY
Business 11487.0
Personal 717.7
Name: MILES, dtype: float64

In [218]:

[11487.0/(11487.0 + 717.7)] # For Business Purpose

Out[218]:

[0.9411947856153776]

In [219]:
[717.7/(11487.0 + 717.7)] # For Personal Purpose

Out[219]:
[0.058805214384622315]

94.12% of miles were clocked under Business Category and 5.88% of miles were clocked under Personal Category.

THE END

Rahulsharma - 03 12 23
No ratings yet
Rahulsharma - 03 12 23
25 pages
Tour Insurance Claim Prediction Models
0% (1)
Tour Insurance Claim Prediction Models
16 pages
Ruhee Ansari - Advanced Statistic Project SCB
100% (1)
Ruhee Ansari - Advanced Statistic Project SCB
28 pages
UL Coded Project Report - KC
No ratings yet
UL Coded Project Report - KC
30 pages
Capstone Notes-1
No ratings yet
Capstone Notes-1
18 pages
Financial Risk Analysis Report
No ratings yet
Financial Risk Analysis Report
13 pages
DVT (Car Cliams Insurance) Project
No ratings yet
DVT (Car Cliams Insurance) Project
20 pages
Project - Advanced Statistics - Final-1
100% (3)
Project - Advanced Statistics - Final-1
15 pages
SMDM Assignment: Problem 1
0% (1)
SMDM Assignment: Problem 1
16 pages
FRA Project Report Milestone 1 PDF
No ratings yet
FRA Project Report Milestone 1 PDF
29 pages
NIrupam Agarwal Business Report-ML
100% (1)
NIrupam Agarwal Business Report-ML
23 pages
Annual Spending Analysis of Retailers in Portugal
No ratings yet
Annual Spending Analysis of Retailers in Portugal
12 pages
Business - Report-Comp-Fin - Data - Part A - Problem
No ratings yet
Business - Report-Comp-Fin - Data - Part A - Problem
17 pages
FRA Main Project Part B Guided
No ratings yet
FRA Main Project Part B Guided
23 pages
Car Crash Survival Prediction Business Report
No ratings yet
Car Crash Survival Prediction Business Report
23 pages
Predicting Mode of Transport (ML) : Akalya KS
No ratings yet
Predicting Mode of Transport (ML) : Akalya KS
17 pages
Business Report FRA-Extended Project
No ratings yet
Business Report FRA-Extended Project
22 pages
Business Report Pradeep Chauhan 11june'23
100% (1)
Business Report Pradeep Chauhan 11june'23
25 pages
Monisha - DATA VISUALIZATION TABLEAU PROJECT
No ratings yet
Monisha - DATA VISUALIZATION TABLEAU PROJECT
9 pages
SMDM Project Report Dipti
No ratings yet
SMDM Project Report Dipti
14 pages
Amit Khilare Used Device Data PM Project
No ratings yet
Amit Khilare Used Device Data PM Project
25 pages
Time Series Forecasting Jupyter Code - Ipynb
No ratings yet
Time Series Forecasting Jupyter Code - Ipynb
2,484 pages
Clustering Analysis: Prepared by Muralidharan N
100% (1)
Clustering Analysis: Prepared by Muralidharan N
16 pages
Capstone Project Report 2
No ratings yet
Capstone Project Report 2
178 pages
Bank Customer Segmentation Guide
No ratings yet
Bank Customer Segmentation Guide
32 pages
Data Mining Project - 27.06.2021
No ratings yet
Data Mining Project - 27.06.2021
6 pages
Ott Data Project
No ratings yet
Ott Data Project
40 pages
Fra Project Report-Bajaj Auto Ltd. Vs Hero Motocorp Ltd. (Group-X)
100% (1)
Fra Project Report-Bajaj Auto Ltd. Vs Hero Motocorp Ltd. (Group-X)
10 pages
Dbms db03 2020 Assessment (Solved) : Find Study Resources
50% (2)
Dbms db03 2020 Assessment (Solved) : Find Study Resources
12 pages
SMDM - Project Report - Lakshmi
No ratings yet
SMDM - Project Report - Lakshmi
26 pages
Data Visualization Using TABLEAU - Amit - Khilare
100% (1)
Data Visualization Using TABLEAU - Amit - Khilare
3 pages
Predictive Modelling Alternate Project Business Case
No ratings yet
Predictive Modelling Alternate Project Business Case
47 pages
Problem Statement
0% (2)
Problem Statement
2 pages
Project Report TSF Extendec
No ratings yet
Project Report TSF Extendec
52 pages
Business Report SMDM Project - Coded
No ratings yet
Business Report SMDM Project - Coded
27 pages
SMDM Project Instructions & Analysis
50% (2)
SMDM Project Instructions & Analysis
5 pages
MySQL Week 5 Quiz Questions
100% (1)
MySQL Week 5 Quiz Questions
6 pages
Wholesale Custumer
100% (1)
Wholesale Custumer
32 pages
Finance Risk Analytics - Priyanka Sharma - Business Report
No ratings yet
Finance Risk Analytics - Priyanka Sharma - Business Report
49 pages
FINANCE & RISK ANALYTICS – PROJECT - YARESH VIJAYASUNDARAM
No ratings yet
FINANCE & RISK ANALYTICS – PROJECT - YARESH VIJAYASUNDARAM
48 pages
Prathamesh Shukla SMDM Project 20.08.23
100% (1)
Prathamesh Shukla SMDM Project 20.08.23
34 pages
Data Mining Assignment Guide
100% (1)
Data Mining Assignment Guide
21 pages
Time Series Analysis of Wine Sales
100% (1)
Time Series Analysis of Wine Sales
5 pages
BUSINESS REPORT Part 1
No ratings yet
BUSINESS REPORT Part 1
9 pages
Advance Stats Project Parijat
No ratings yet
Advance Stats Project Parijat
18 pages
Predictive Modelling Project - Nandini
No ratings yet
Predictive Modelling Project - Nandini
31 pages
Problem 2 Businessreport ML
No ratings yet
Problem 2 Businessreport ML
9 pages
P L Lohitha 19-04-23 TSF Business Report
No ratings yet
P L Lohitha 19-04-23 TSF Business Report
70 pages
M4 Data Mining W4 Business Report
No ratings yet
M4 Data Mining W4 Business Report
22 pages
Wholesale Customer Analysis & CMSU Survey
100% (1)
Wholesale Customer Analysis & CMSU Survey
19 pages
Ad Inventory Data Analysis
0% (1)
Ad Inventory Data Analysis
1,657 pages
Capstone Project: Team India Performance Analysis
No ratings yet
Capstone Project: Team India Performance Analysis
27 pages
SMT Capstone PPT Ayushi Rastogi PGPDSBA.O.MAY22.C
No ratings yet
SMT Capstone PPT Ayushi Rastogi PGPDSBA.O.MAY22.C
12 pages
Predictive Modeling
No ratings yet
Predictive Modeling
38 pages
Boston House Price Analysis Report
100% (1)
Boston House Price Analysis Report
8 pages
Election Prediction Model Report
No ratings yet
Election Prediction Model Report
43 pages
Notes Uber Data Analysis Project
No ratings yet
Notes Uber Data Analysis Project
11 pages
Uber Rides Data Analysis 2016
No ratings yet
Uber Rides Data Analysis 2016
12 pages
Dev Record Aids
No ratings yet
Dev Record Aids
24 pages
Global EV Charger Market Insights 2022-2030
No ratings yet
Global EV Charger Market Insights 2022-2030
20 pages
Graded Project As - Kamalpreet Kaur
No ratings yet
Graded Project As - Kamalpreet Kaur
8 pages
SM1001809 Reference Material - Vocabulary Word List - Beginner
No ratings yet
SM1001809 Reference Material - Vocabulary Word List - Beginner
53 pages
Checklist For Fintree
0% (1)
Checklist For Fintree
2 pages
STR
No ratings yet
STR
33 pages
Fintree Process With SMC
No ratings yet
Fintree Process With SMC
11 pages
Fintree 10X Workbench V02 Final
No ratings yet
Fintree 10X Workbench V02 Final
6 pages
Let's MD Customer Loan Process
No ratings yet
Let's MD Customer Loan Process
8 pages
Marketing Analytics Experiments Calculations
No ratings yet
Marketing Analytics Experiments Calculations
7 pages
For Franchise Owners V2 0
No ratings yet
For Franchise Owners V2 0
17 pages
S&P 500 Index Duke Energy Stock
No ratings yet
S&P 500 Index Duke Energy Stock
12 pages
Indian Companies Hiring During Covid-19 PDF
No ratings yet
Indian Companies Hiring During Covid-19 PDF
116 pages
Apple vs Samsung Brand Architecture Analysis
No ratings yet
Apple vs Samsung Brand Architecture Analysis
6 pages
Order Item Details
No ratings yet
Order Item Details
6 pages
Building Brand Architature Report On Apple and Samsung: Imagination Liberty Regained Innovation Passion
No ratings yet
Building Brand Architature Report On Apple and Samsung: Imagination Liberty Regained Innovation Passion
6 pages
S K Bagchi - Weld Failure in Oil & Gas Industries
No ratings yet
S K Bagchi - Weld Failure in Oil & Gas Industries
6 pages
SBL - Answer Script
No ratings yet
SBL - Answer Script
4 pages
U - CS - 20 - 023 Laundry Management System One
No ratings yet
U - CS - 20 - 023 Laundry Management System One
4 pages
Economics
No ratings yet
Economics
22 pages
Human Resource Champions
88% (8)
Human Resource Champions
12 pages
Injury Rate in A Helicopter Underwater Escape Trainer (HUET) From 2005 - 2012
No ratings yet
Injury Rate in A Helicopter Underwater Escape Trainer (HUET) From 2005 - 2012
6 pages
Gas Piping Best Practices Chris Wolfe
No ratings yet
Gas Piping Best Practices Chris Wolfe
11 pages
Monarch-Fee Structure (AY-2024-25)
No ratings yet
Monarch-Fee Structure (AY-2024-25)
2 pages
Agricultural Extension in The Parish Development Model - June 2021 - Revised
No ratings yet
Agricultural Extension in The Parish Development Model - June 2021 - Revised
10 pages
Units and Dimensions (Test I) Time Allowed: 60 Minutes Max Marks: 120 Instructions
67% (3)
Units and Dimensions (Test I) Time Allowed: 60 Minutes Max Marks: 120 Instructions
3 pages
Barrication 02
No ratings yet
Barrication 02
1 page
Plant Watering System
No ratings yet
Plant Watering System
5 pages
Network Protocols and OSI Model Overview
No ratings yet
Network Protocols and OSI Model Overview
118 pages
Cloud Security
No ratings yet
Cloud Security
11 pages
Climate Finance in Indonesia
No ratings yet
Climate Finance in Indonesia
33 pages
Analisis Regresi Linier Berganda
No ratings yet
Analisis Regresi Linier Berganda
75 pages
Vakalatnama
No ratings yet
Vakalatnama
1 page
Ritu
100% (1)
Ritu
33 pages
Jaa TGL-10 (Rnav) PDF
No ratings yet
Jaa TGL-10 (Rnav) PDF
29 pages
Khab
0% (1)
Khab
237 pages
Zhichi Co Word
No ratings yet
Zhichi Co Word
1 page
Efficient EFF-WB Dust Filter System
No ratings yet
Efficient EFF-WB Dust Filter System
2 pages
2 Content Marketing Strategy - Study Guide - Workbook
No ratings yet
2 Content Marketing Strategy - Study Guide - Workbook
5 pages
Statistical Abstract of Andhra Pradesh 2019
100% (1)
Statistical Abstract of Andhra Pradesh 2019
723 pages
Describe A Photo
100% (8)
Describe A Photo
2 pages
Project Report Of: Leo Multiple District 306, Sri Lanka
No ratings yet
Project Report Of: Leo Multiple District 306, Sri Lanka
16 pages
Quantitative Decision Making Guide
No ratings yet
Quantitative Decision Making Guide
2 pages
Slot1.1 What Is Back End Development
No ratings yet
Slot1.1 What Is Back End Development
18 pages
Microsoft Word - Gosavi - PRATIK-Orgnal
No ratings yet
Microsoft Word - Gosavi - PRATIK-Orgnal
7 pages
Fixed Asset Register Sample
No ratings yet
Fixed Asset Register Sample
59 pages

Uber Trip Data Analysis

Uploaded by

Uber Trip Data Analysis

Uploaded by

Uber Drive:

Q1. Show the last 10 records of the dataset. (2 point)

START_DATE* END_DATE* CATEGORY* START* STOP* MILES* PURPOSE*

Q2. Show the first 10 records of the dataset. (2 points)

START_DATE* END_DATE* CATEGORY* START* STOP* MILES* PURPOSE*

West Palm West Palm

7 01-07-2016 13:27 01-07-2016 13:33 Business Cary Cary 0.8 Meeting

8 01-10-2016 08:05 01-10-2016 08:25 Business Cary Morrisville 8.3 Meeting

Q3. Show the dimension(number of rows and columns) of the dataset. (2

print("The number of rows are ",ud.shape[0],"\nThe number of columns are",ud.shape[1])

The number of rows are 1155

Q6. Check for missing values. (2 points)

START_DATE* END_DATE* CATEGORY* START* STOP* MILES* PURPOSE*

0 False False False False False False False

1 False False False False False False True

2 False False False False False False False

3 False False False False False False False

4 False False False False False False False

... ... ... ... ... ... ... ...

1150 False False False False False False False

1151 False False False False False False False

1152 False False False False False False False

1153 False False False False False False False

1154 False False False False False False False

1155 rows × 7 columns

The missing values present in entire data set is 502.

Q8. Get the summary of the original data. (2 points).

START_DATE END_DATE CATEGORY START STOP MILES PURPOSE

... ... ... ... ... ... ... ...

Q10. Check the information of the dataframe(df). (1 points)

Q11. Get the unique start locations. (2 points)

START_DATE END_DATE CATEGORY START STOP MILES PURPOSE

West Palm West Palm

Q12. What is the total number of unique start locations? (2 points)

ud.columns = ud.columns.str.replace('*','', regex =True)

START_DATE END_DATE CATEGORY START STOP MILES PURPOSE

Q13. What is the total number of unique stop locations. (2 points)

ud[ud['START'] == 'San Francisco']

START_DATE END_DATE CATEGORY START STOP MILES PURPOSE

Hint:Popular means the place that is visited the most

Hint: Popular means the place that is visited the most

array(['Meal/Entertain', 'Errand/Supplies', 'Meeting', 'Customer Visit',

Hint:You have to plot total/sum miles per purpose

Maximum distance travelled by Uber is to commute.

4 Customer Visit 63.7

... ... ...

1150 Meeting 0.7

1151 Temporary Site 3.9

1152 Meeting 16.2

1153 Temporary Site 6.4

1154 Temporary Site 48.2

1155 rows × 2 columns

Note:Use the original dataframe without dropping the 'NA' values.

[11487.0/(11487.0 + 717.7)] # For Business Purpose

You might also like