0% found this document useful (0 votes)

250 views6 pages

Pandas Exercise

Uploaded by

vesike3421

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

250 views6 pages

Pandas Exercise

Uploaded by

vesike3421

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

TASK: Run the following code to read in the "hotel_booking_data.csv" file.

Feel free
to explore the file a bit before continuing with the rest of the exercise.

In [1]: import pandas as pd

In [2]: hotels = pd.read_csv("C:\\Users\\HP\\Desktop\\Python\\Code\\UNZIP_FOR_NOTEBOOKS_FINAL\\03-Pandas

In [3]: hotels.head()

Out[3]:
hotel is_canceled lead_time arrival_date_year arrival_date_month arrival_date_week_number arrival_date_day_of_m

Resort
0 0 342 2015 July 27
Hotel

Resort
1 0 737 2015 July 27
Hotel

Resort
2 0 7 2015 July 27
Hotel

Resort
3 0 13 2015 July 27
Hotel

Resort
4 0 14 2015 July 27
Hotel

5 rows × 36 columns

TASK: How many rows are there?

In [4]: # CODE HERE

len(hotels)

Out[4]: 119390

TASK: Is there any missing data? If so, which column has the most missing data?
In [5]: # CODE HERE
hotels.isnull().sum()

Out[5]: hotel 0
is_canceled 0
lead_time 0
arrival_date_year 0
arrival_date_month 0
arrival_date_week_number 0
arrival_date_day_of_month 0
stays_in_weekend_nights 0
stays_in_week_nights 0
adults 0
children 4
babies 0
meal 0
country 488
market_segment 0
distribution_channel 0
is_repeated_guest 0
previous_cancellations 0
previous_bookings_not_canceled 0
reserved_room_type 0
assigned_room_type 0
booking_changes 0
deposit_type 0
agent 16340
company 112593
days_in_waiting_list 0
customer_type 0
adr 0
required_car_parking_spaces 0
total_of_special_requests 0
reservation_status 0
reservation_status_date 0
name 0
email 0
phone-number 0
credit_card 0
dtype: int64

In [6]: print(f"Yes, missing data, company column missing: {hotels['company'].isna().sum()} rows.")

Yes, missing data, company column missing: 112593 rows.

TASK: Drop the "company" column from the dataset.

In [7]: hotels.drop(columns=['company'],inplace=True)

TASK: What are the top 5 most common country codes in the dataset?

In [9]: hotels['country'].value_counts()[:5]

Out[9]: PRT 48590

GBR 12129
FRA 10415
ESP 8568
DEU 7287
Name: country, dtype: int64

TASK: What is the name of the person who paid the highest ADR (average daily rate)? How much was their
ADR?
In [10]: # CODE HERE
hotels.sort_values('adr',ascending=False)[['name','adr']].iloc[0]

Out[10]: name Daniel Walter

adr 5400.0
Name: 48515, dtype: object

TASK: The adr is the average daily rate for a person's stay at the hotel. What is the mean adr across all the
hotel stays in the dataset?

In [43]: # CODE HERE

round(hotels['adr'].mean(),2)

Out[43]: 101.83

TASK: What is the average (mean) number of nights for a stay across the entire data set? Feel free to round
this to 2 decimal points.

In [46]: # CODE HERE

total_night_stay=hotels['stays_in_week_nights']+hotels['stays_in_weekend_nights']

In [47]: round(total_night_stay.mean(),2)

Out[47]: 3.43

TASK: What is the average total cost for a stay in the dataset? Not average daily cost, but total stay cost. (You
will need to calculate total cost your self by using ADR and week day and weeknight stays). Feel free to round
this to 2 decimal points.

In [49]: # CODE HERE

total_cost=hotels['adr']*total_night_stay

In [52]: round(total_cost.mean(),2)

Out[52]: 357.85

TASK: What are the names and emails of people who made exactly 5 "Special Requests"?
In [58]: # CODE HERE
hotels[hotels['total_of_special_requests']==5][['name','email']]

Out[58]: name email

7860 Amanda Harper [email protected]

11125 Laura Sanders [email protected]

14596 Tommy Ortiz [email protected]

14921 Gilbert Miller [email protected]

14922 Timothy Torres [email protected]

24630 Jennifer Weaver [email protected]

27288 Crystal Horton [email protected]

27477 Brittney Burke [email protected]

29906 Cynthia Cabrera [email protected]

29949 Sarah Floyd [email protected]

32267 Michelle Villa [email protected]

39027 Nichole Hebert [email protected]

39129 Lindsey Mckenzie [email protected]

39525 Ashley Edwards [email protected]

70114 Christopher Torres [email protected]

78819 Mrs. Tara Sullivan DVM [email protected]

78820 Michaela Brown [email protected]

78822 Kurt Maldonado MD [email protected]

97072 Jason Richardson [email protected]

97099 Terri Hurley [email protected]

97261 Mrs. Caitlin Webb [email protected]

98410 Holly Arroyo [email protected]

98674 Denise Campbell [email protected]

99887 Michael Smith [email protected]

99888 Dr. Trevor Sellers [email protected]

101569 Kayla Murphy [email protected]

102061 Taylor Martinez [email protected]

109511 Charles Wilson [email protected]

109590 Tyler Allison [email protected]

110082 Matthew Bailey [email protected]

110083 Charlotte Acevedo [email protected]

111909 Darrell Brennan [email protected]

111911 Melinda Jensen [email protected]

113915 Terry Arnold [email protected]

114770 Mary Nguyen [email protected]

114909 Lindsay Cuevas [email protected]

116455 Cynthia Hernandez [email protected]

116457 Angela Hawkins [email protected]

118817 Sue Lawson [email protected]

119161 Alyssa Richards [email protected]

TASK: What percentage of hotel stays were classified as "repeat guests"? (Do not base this off the name of
the person, but instead of the is_repeated_guest column)
In [77]: round((hotels['is_repeated_guest']==1).sum()/len(hotels['is_repeated_guest'])*100,2)

Out[77]: 3.19

In [ ]:

TASK: What are the top 5 most common last name in the dataset? Bonus: Can you figure this out in one line
of pandas code? (For simplicity treat the a title such as MD as a last name, for example Caroline Conley MD
can be said to have the last name MD)

In [80]: #CODE HERE

first_last_name=hotels['name'].str.split()

In [82]: last_name=first_last_name.str[-1]

In [86]: hotels['name'].apply(lambda name: name.split()[1]).value_counts()[:5]

Out[86]: Smith 2510

Johnson 1998
Williams 1628
Jones 1441
Brown 1433
Name: name, dtype: int64

TASK: What are the names of the people who had booked the most number children and babies for their stay?
(Don't worry if they canceled, only consider number of people reported at the time of their reservation)

In [11]: hotels['total_kids']=hotels['babies']+hotels['children']

In [17]: hotels.sort_values('total_kids',ascending=False)[['name','adults','total_kids','babies','childre

Out[17]:
name adults total_kids babies children

328 Jamie Ramirez 2 10.0 0 10.0

46619 Nicholas Parker 2 10.0 10 0.0

78656 Marc Robinson 1 9.0 9 0.0

19718 Mr. Jeffrey Cross 2 3.0 0 3.0

107837 Albert French 2 3.0 2 1.0

... ... ... ... ... ...

119389 Ariana Michael 2 0.0 0 0.0

40600 Craig Campos 2 NaN 0 NaN

40667 David Murphy 2 NaN 0 NaN

40679 Frank Burton 3 NaN 0 NaN

41160 Jerry Roberts 2 NaN 0 NaN

119390 rows × 5 columns

TASK: What are the top 3 most common area code in the phone numbers? (Area code is first 3 digits)

In [18]: #CODE HERE

area_codes=hotels['phone-number'].str[:3]

In [20]: area_codes.value_counts()[:3]

Out[20]: 799 168

185 167
541 166
Name: phone-number, dtype: int64
TASK: How many arrivals took place between the 1st and the 15th of the month (inclusive of 1 and 15) ?
Bonus: Can you do this in one line of pandas code?

In [21]: #CODE HERE

hotels['arrival_date_day_of_month'].apply(lambda day:day in range(1,16)).sum()

Out[21]: 58152

HARD BONUS TASK: Create a table for counts for each day of the week that people arrived. (E.g. 5000 arrivals
were on a Monday, 3000 were on a Tuesday, etc..)

In [47]: def convert_to_proper(day,month,year):

return f'{day}-{month}-{year}'

In [50]: import numpy as np

hotels['date']=np.vectorize(convert_to_proper)(hotels['arrival_date_day_of_month'],
hotels['arrival_date_month'],
hotels['arrival_date_year'])

In [52]: date_to_day=hotels['date']

In [53]: date_to_day=pd.to_datetime(date_to_day)

In [55]: date_to_day.dt.day_name().value_counts()

Out[55]: Friday 19631

Thursday 19254
Monday 18171
Saturday 18055
Wednesday 16139
Sunday 14141
Tuesday 13999
Name: date, dtype: int64

3.3.3.3 Packet Tracer 7 - Explore A Network
No ratings yet
3.3.3.3 Packet Tracer 7 - Explore A Network
3 pages
Socket Programming PDF
0% (1)
Socket Programming PDF
4 pages
MATLAB for Engineering Students
No ratings yet
MATLAB for Engineering Students
10 pages
Concurrency Control: Practice Exercises
No ratings yet
Concurrency Control: Practice Exercises
4 pages
Fast DCT Algorithm for Signal Processing
No ratings yet
Fast DCT Algorithm for Signal Processing
4 pages
Python Exploit Writing Course Overview
100% (1)
Python Exploit Writing Course Overview
2 pages
3.4.2.4 Packet Tracer - Configuring GRE
100% (1)
3.4.2.4 Packet Tracer - Configuring GRE
2 pages
Chapter7 Part 2
No ratings yet
Chapter7 Part 2
31 pages
6.4.3.3 Packet Tracer - Connect A Router To A LAN - Instructions
No ratings yet
6.4.3.3 Packet Tracer - Connect A Router To A LAN - Instructions
5 pages
8-bit Microprocessor VLSI Report
No ratings yet
8-bit Microprocessor VLSI Report
35 pages
Module 2: Switching Concepts: Switching, Routing, and Wireless Essentials v7.0 (SRWE)
No ratings yet
Module 2: Switching Concepts: Switching, Routing, and Wireless Essentials v7.0 (SRWE)
14 pages
Digital Comm Sheet2
No ratings yet
Digital Comm Sheet2
5 pages
CNNs for Image Pattern Recognition
100% (1)
CNNs for Image Pattern Recognition
3 pages
Computer Security Basics Lecture Notes
No ratings yet
Computer Security Basics Lecture Notes
3 pages
Analog Communication via Optical Fiber
No ratings yet
Analog Communication via Optical Fiber
10 pages
Purbanchal University 2018
No ratings yet
Purbanchal University 2018
2 pages
Intrusion Detection in Smart Grid
No ratings yet
Intrusion Detection in Smart Grid
4 pages
Question Bank - Full Course Final Examination Jan 2025
100% (1)
Question Bank - Full Course Final Examination Jan 2025
7 pages
Architecture: TMS320C54x
No ratings yet
Architecture: TMS320C54x
14 pages
Computer Network MCQ
No ratings yet
Computer Network MCQ
3 pages
CCNP Switching: CDP & LLDP Insights
No ratings yet
CCNP Switching: CDP & LLDP Insights
80 pages
LAN Protocols: MAC & LLC Layers
No ratings yet
LAN Protocols: MAC & LLC Layers
36 pages
(3110015) Maths 2 GTU Question Bank
No ratings yet
(3110015) Maths 2 GTU Question Bank
41 pages
Inverse Laplace Transform Problems
No ratings yet
Inverse Laplace Transform Problems
2 pages
Exam C1000 - 059 IBM AI Enterprise Workflow V1 Data Scientist Specialist
100% (1)
Exam C1000 - 059 IBM AI Enterprise Workflow V1 Data Scientist Specialist
6 pages
Jake S Resume
No ratings yet
Jake S Resume
1 page
Ecommerce Website Using Django.: Project in Python
No ratings yet
Ecommerce Website Using Django.: Project in Python
3 pages
DC Network - 6 - 26 - 18
No ratings yet
DC Network - 6 - 26 - 18
61 pages
2.2.4.10 Packet Tracer - Troubleshooting Switch Port Security Instructions
No ratings yet
2.2.4.10 Packet Tracer - Troubleshooting Switch Port Security Instructions
1 page
Embedded System Experiment #2
No ratings yet
Embedded System Experiment #2
14 pages
Wavelet Transform Seminar
100% (1)
Wavelet Transform Seminar
24 pages
Critical Path Method
No ratings yet
Critical Path Method
7 pages
Practice Questions Lec4
No ratings yet
Practice Questions Lec4
2 pages
Microcontrollers Explained
No ratings yet
Microcontrollers Explained
11 pages
Activity 1
No ratings yet
Activity 1
5 pages
Routing Table Examples and Configurations
No ratings yet
Routing Table Examples and Configurations
5 pages
MAC - IP Address Conversion
No ratings yet
MAC - IP Address Conversion
3 pages
Microcontroller Programming Guide
No ratings yet
Microcontroller Programming Guide
13 pages
CCNA2 Commands Summary
100% (1)
CCNA2 Commands Summary
10 pages
Scanned Document Overview
No ratings yet
Scanned Document Overview
4 pages
WEEK 1: E-R Model: 1. BUS Entity
No ratings yet
WEEK 1: E-R Model: 1. BUS Entity
8 pages
System Modeling for Engineers
No ratings yet
System Modeling for Engineers
25 pages
A Survey of Public IoT Datasets For Network Security Research
No ratings yet
A Survey of Public IoT Datasets For Network Security Research
33 pages
Numerical Analysis Problems and Solutions PART 1 CH 1 To CH 3
No ratings yet
Numerical Analysis Problems and Solutions PART 1 CH 1 To CH 3
99 pages
Mathcad 15 Administration Guide
No ratings yet
Mathcad 15 Administration Guide
26 pages
ASR
No ratings yet
ASR
13 pages
Hardwired Control Unit Vs Microprogrammed Control Unit
No ratings yet
Hardwired Control Unit Vs Microprogrammed Control Unit
4 pages
Math B
No ratings yet
Math B
334 pages
Small Forwarding Tables for Routing
No ratings yet
Small Forwarding Tables for Routing
90 pages
DSA Problem Solving Patterns
No ratings yet
DSA Problem Solving Patterns
16 pages
Vulnhub: Glasgow Smile 2: About
No ratings yet
Vulnhub: Glasgow Smile 2: About
19 pages
MS-DOS Basics for Beginners
No ratings yet
MS-DOS Basics for Beginners
12 pages
Ensemble Learning Quiz
No ratings yet
Ensemble Learning Quiz
34 pages
Bar Chart Scheduling in Construction
100% (1)
Bar Chart Scheduling in Construction
2 pages
C Programming Exercises Compilation
No ratings yet
C Programming Exercises Compilation
18 pages
Network Simulation Using NS2: A Tutorial by
No ratings yet
Network Simulation Using NS2: A Tutorial by
99 pages
Cheat Sheet (1) (1) - 6
No ratings yet
Cheat Sheet (1) (1) - 6
1 page
Hotel Booking Data Analysis
No ratings yet
Hotel Booking Data Analysis
67 pages
Project SLC DSBA INNHotels FullCode-Copy1
No ratings yet
Project SLC DSBA INNHotels FullCode-Copy1
138 pages
Hotel Data Analysis 1688589189
No ratings yet
Hotel Data Analysis 1688589189
12 pages
Common Language Runtime (CLR) in Detail
No ratings yet
Common Language Runtime (CLR) in Detail
12 pages
@pawgpaige On Tumblr 2
0% (1)
@pawgpaige On Tumblr 2
1 page
SOFTWARE ENGINEERING-New-UNIT-1 FOR IT Students
No ratings yet
SOFTWARE ENGINEERING-New-UNIT-1 FOR IT Students
111 pages
Teamcenter ,: "File Client Cache Is Not Enabled. Please Contact Your System Administrator. Clientrequestexception: Volume - File - Not - Found - 4"
No ratings yet
Teamcenter ,: "File Client Cache Is Not Enabled. Please Contact Your System Administrator. Clientrequestexception: Volume - File - Not - Found - 4"
4 pages
205CS006
No ratings yet
205CS006
1 page
CD Lab Manual
No ratings yet
CD Lab Manual
17 pages
Intelligent Continuous Improvement When BPM Meets AI
No ratings yet
Intelligent Continuous Improvement When BPM Meets AI
52 pages
Software Requirements Specification
No ratings yet
Software Requirements Specification
10 pages
Enterprise Application Development With Java EE
No ratings yet
Enterprise Application Development With Java EE
21 pages
PC Memory Upgrade Guide
No ratings yet
PC Memory Upgrade Guide
32 pages
Wink App Privacy Policy Overview
No ratings yet
Wink App Privacy Policy Overview
17 pages
Basic CSS
No ratings yet
Basic CSS
9 pages
Kompyuter Ko Rishni Mikroprotsessorlar Asosida Dasturlash
No ratings yet
Kompyuter Ko Rishni Mikroprotsessorlar Asosida Dasturlash
6 pages
PS Cyber Security
No ratings yet
PS Cyber Security
3 pages
Hazard-Proof House Design Project
No ratings yet
Hazard-Proof House Design Project
5 pages
CM3620 Vansco Volvo
100% (1)
CM3620 Vansco Volvo
106 pages
1 Week Steady-Glucose Meal Plan Omnivore Edition PDF Meal Salad 2
No ratings yet
1 Week Steady-Glucose Meal Plan Omnivore Edition PDF Meal Salad 2
1 page
Information Technology Diploma Lecture 4
No ratings yet
Information Technology Diploma Lecture 4
7 pages
SAP Cloud For Customer Extension Guide: Public Document Version: 1911 - 2019-12-27
No ratings yet
SAP Cloud For Customer Extension Guide: Public Document Version: 1911 - 2019-12-27
142 pages
Delphi Xe: Intraweb Xi
0% (2)
Delphi Xe: Intraweb Xi
6 pages
GDPR Data Protection How To Remove Personal Data From Documents and TMs in Memoq
No ratings yet
GDPR Data Protection How To Remove Personal Data From Documents and TMs in Memoq
19 pages
Newline EcoSystme - Display Management - User Guide
No ratings yet
Newline EcoSystme - Display Management - User Guide
9 pages
E-Commerce Lab Manual
No ratings yet
E-Commerce Lab Manual
55 pages
Iphone 12 Pro: Repair Manual
No ratings yet
Iphone 12 Pro: Repair Manual
93 pages
Comprehensive 2023 Cybersecurity Guide
No ratings yet
Comprehensive 2023 Cybersecurity Guide
110 pages
Understanding Python Dictionaries
No ratings yet
Understanding Python Dictionaries
6 pages
Alexis Ixora: Profile Contact
No ratings yet
Alexis Ixora: Profile Contact
2 pages
Stored Procedures in SAP Business One
No ratings yet
Stored Procedures in SAP Business One
6 pages
CSO Gaddis Java Chapter06 6ge
No ratings yet
CSO Gaddis Java Chapter06 6ge
62 pages
Prerequisites Developer
No ratings yet
Prerequisites Developer
7 pages

Pandas Exercise

Uploaded by

Pandas Exercise

Uploaded by

TASK: Run the following code to read in the "hotel_booking_data.csv" file.

In [1]: import pandas as pd

In [2]: hotels = pd.read_csv("C:\\Users\\HP\\Desktop\\Python\\Code\\UNZIP_FOR_NOTEBOOKS_FINAL\\03-Pandas

TASK: How many rows are there?

In [4]: # CODE HERE

In [6]: print(f"Yes, missing data, company column missing: {hotels['company'].isna().sum()} rows.")

Yes, missing data, company column missing: 112593 rows.

TASK: Drop the "company" column from the dataset.

Out[9]: PRT 48590

Out[10]: name Daniel Walter

In [43]: # CODE HERE

In [46]: # CODE HERE

In [49]: # CODE HERE

Out[58]: name email

7860 Amanda Harper [email protected]

11125 Laura Sanders [email protected]

14596 Tommy Ortiz [email protected]

14921 Gilbert Miller [email protected]

14922 Timothy Torres [email protected]

24630 Jennifer Weaver [email protected]

27288 Crystal Horton [email protected]

27477 Brittney Burke [email protected]

29906 Cynthia Cabrera [email protected]

29949 Sarah Floyd [email protected]

32267 Michelle Villa [email protected]

39027 Nichole Hebert [email protected]

39129 Lindsey Mckenzie [email protected]

39525 Ashley Edwards [email protected]

70114 Christopher Torres [email protected]

78819 Mrs. Tara Sullivan DVM [email protected]

78820 Michaela Brown [email protected]

78822 Kurt Maldonado MD [email protected]

97072 Jason Richardson [email protected]

97099 Terri Hurley [email protected]

97261 Mrs. Caitlin Webb [email protected]

98410 Holly Arroyo [email protected]

98674 Denise Campbell [email protected]

99887 Michael Smith [email protected]

99888 Dr. Trevor Sellers [email protected]

101569 Kayla Murphy [email protected]

102061 Taylor Martinez [email protected]

109511 Charles Wilson [email protected]

109590 Tyler Allison [email protected]

110082 Matthew Bailey [email protected]

110083 Charlotte Acevedo [email protected]

111909 Darrell Brennan [email protected]

111911 Melinda Jensen [email protected]

113915 Terry Arnold [email protected]

114770 Mary Nguyen [email protected]

114909 Lindsay Cuevas [email protected]

116455 Cynthia Hernandez [email protected]

116457 Angela Hawkins [email protected]

118817 Sue Lawson [email protected]

119161 Alyssa Richards [email protected]

In [80]: #CODE HERE

In [86]: hotels['name'].apply(lambda name: name.split()[1]).value_counts()[:5]

Out[86]: Smith 2510

328 Jamie Ramirez 2 10.0 0 10.0

46619 Nicholas Parker 2 10.0 10 0.0

78656 Marc Robinson 1 9.0 9 0.0

19718 Mr. Jeffrey Cross 2 3.0 0 3.0

107837 Albert French 2 3.0 2 1.0

... ... ... ... ... ...

119389 Ariana Michael 2 0.0 0 0.0

40600 Craig Campos 2 NaN 0 NaN

40667 David Murphy 2 NaN 0 NaN

40679 Frank Burton 3 NaN 0 NaN

41160 Jerry Roberts 2 NaN 0 NaN

119390 rows × 5 columns

In [18]: #CODE HERE

Out[20]: 799 168

In [21]: #CODE HERE

In [47]: def convert_to_proper(day,month,year):

In [50]: import numpy as np

Out[55]: Friday 19631

You might also like