0% found this document useful (0 votes)

16 views7 pages

Pandas Data Analysis Tutorial

Uploaded by

phatht23413e

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views7 pages

Pandas Data Analysis Tutorial

Uploaded by

phatht23413e

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

pandas-tutorial

November 29, 2024

[7]: import pandas as pd

import numpy as np
df = pd.read_csv("C:/Local Code/Resources/Data/chipotle.tsv", sep = "\t")

[8]: df.head()

[8]: order_id quantity item_name \

0 1 1 Chips and Fresh Tomato Salsa
1 1 1 Izze
2 1 1 Nantucket Nectar
3 1 1 Chips and Tomatillo-Green Chili Salsa
4 2 2 Chicken Bowl

choice_description item_price
0 NaN $2.39
1 [Clementine] $3.39
2 [Apple] $3.39
3 NaN $2.39
4 [Tomatillo-Red Chili Salsa (Hot), [Black Beans… $16.98

[9]: df.info() # hiển thị thông tin từng cột một

#choice of description có nhiều NaN data, nên chỉ còn các choice sử dụng được␣
↪là 3376 data

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4622 entries, 0 to 4621
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 order_id 4622 non-null int64
1 quantity 4622 non-null int64
2 item_name 4622 non-null object
3 choice_description 3376 non-null object
4 item_price 4622 non-null object
dtypes: int64(2), object(3)
memory usage: 180.7+ KB

1
[10]: df.columns # 1 dạng object
#muốn chuyển sang list thì chỉ việc
list(df.columns)

[10]: ['order_id', 'quantity', 'item_name', 'choice_description', 'item_price']

[11]: df.index # tìm index, muốn tìm data return từ bao nhiêu đến bao nhiêu

[11]: RangeIndex(start=0, stop=4622, step=1)

[12]: #describe: return statistics summary

df.describe()

[12]: order_id quantity

count 4622.000000 4622.000000
mean 927.254868 1.075725
std 528.890796 0.410186
min 1.000000 1.000000
25% 477.250000 1.000000
50% 926.000000 1.000000
75% 1393.000000 1.000000
max 1834.000000 15.000000

[13]: df.describe(include = "all")

[13]: order_id quantity item_name choice_description item_price

count 4622.000000 4622.000000 4622 3376 4622
unique NaN NaN 50 1043 78
top NaN NaN Chicken Bowl [Diet Coke] $8.75
freq NaN NaN 726 134 730
mean 927.254868 1.075725 NaN NaN NaN
std 528.890796 0.410186 NaN NaN NaN
min 1.000000 1.000000 NaN NaN NaN
25% 477.250000 1.000000 NaN NaN NaN
50% 926.000000 1.000000 NaN NaN NaN
75% 1393.000000 1.000000 NaN NaN NaN
max 1834.000000 15.000000 NaN NaN NaN

[14]: df.describe(percentiles = [0.1,0.3,0.24,0.44] )

[14]: order_id quantity

count 4622.000000 4622.000000
mean 927.254868 1.075725
std 528.890796 0.410186
min 1.000000 1.000000
10% 198.000000 1.000000

2
24% 458.040000 1.000000
30% 563.000000 1.000000
44% 818.000000 1.000000
50% 926.000000 1.000000
max 1834.000000 15.000000

Ilocation and Idex Location

[16]: df.head()

[16]: order_id quantity item_name \

0 1 1 Chips and Fresh Tomato Salsa
1 1 1 Izze
2 1 1 Nantucket Nectar
3 1 1 Chips and Tomatillo-Green Chili Salsa
4 2 2 Chicken Bowl

choice_description item_price
0 NaN $2.39
1 [Clementine] $3.39
2 [Apple] $3.39
3 NaN $2.39
4 [Tomatillo-Red Chili Salsa (Hot), [Black Beans… $16.98

[17]: df.loc[(df.quantity == 15) | (df.item_name == "Nantucket Nectar")]

[17]: order_id quantity item_name \

2 1 1 Nantucket Nectar
22 11 1 Nantucket Nectar
105 46 1 Nantucket Nectar
173 77 1 Nantucket Nectar
205 91 1 Nantucket Nectar
436 189 1 Nantucket Nectar
601 247 2 Nantucket Nectar
925 381 1 Nantucket Nectar
1356 553 1 Nantucket Nectar
1585 641 1 Nantucket Nectar
1626 656 1 Nantucket Nectar
1706 690 1 Nantucket Nectar
2162 872 1 Nantucket Nectar
2379 947 2 Nantucket Nectar
2381 947 1 Nantucket Nectar
2430 965 1 Nantucket Nectar
2653 1053 1 Nantucket Nectar
2818 1118 1 Nantucket Nectar
2838 1128 1 Nantucket Nectar
2853 1133 1 Nantucket Nectar

3
2949 1172 1 Nantucket Nectar
3318 1330 1 Nantucket Nectar
3368 1351 1 Nantucket Nectar
3570 1433 1 Nantucket Nectar
3598 1443 15 Chips and Fresh Tomato Salsa
3845 1541 1 Nantucket Nectar
4019 1609 1 Nantucket Nectar
4078 1632 1 Nantucket Nectar

choice_description item_price
2 [Apple] $3.39
22 [Pomegranate Cherry] $3.39
105 [Pineapple Orange Banana] $3.39
173 [Apple] $3.39
205 [Peach Orange] $3.39
436 [Pomegranate Cherry] $3.39
601 [Pineapple Orange Banana] $6.78
925 [Pomegranate Cherry] $3.39
1356 [Pomegranate Cherry] $3.39
1585 [Peach Orange] $3.39
1626 [Pineapple Orange Banana] $3.39
1706 [Apple] $3.39
2162 [Pineapple Orange Banana] $3.39
2379 [Peach Orange] $6.78
2381 [Apple] $3.39
2430 [Pomegranate Cherry] $3.39
2653 [Pineapple Orange Banana] $3.39
2818 [Apple] $3.39
2838 [Peach Orange] $3.39
2853 [Apple] $3.39
2949 [Peach Orange] $3.39
3318 [Peach Orange] $3.39
3368 [Pineapple Orange Banana] $3.39
3570 [Pineapple Orange Banana] $3.39
3598 NaN $44.25
3845 [Peach Orange] $3.39
4019 [Pineapple Orange Banana] $3.39
4078 [Peach Orange] $3.39

[18]: #loc: chỉ chọn hàng mong muốn

print(type(df.loc[1]))
print(type(df.iloc[1]))

4
[19]: df.loc[(df.quantity ==2) & (df.item_name == "Nantucket Nectar"),␣
↪["order_id","quantity","item_name"]]

[19]: order_id quantity item_name

601 247 2 Nantucket Nectar
2379 947 2 Nantucket Nectar

[20]: df.loc[(df.quantity ==2) & (df.item_name == "Nantucket Nectar")]

[20]: order_id quantity item_name choice_description \

601 247 2 Nantucket Nectar [Pineapple Orange Banana]
2379 947 2 Nantucket Nectar [Peach Orange]

item_price
601 $6.78
2379 $6.78

[21]: df.loc[(df.quantity >= 2) & (df.item_name == "Nantucket Nectar")]

[21]: order_id quantity item_name choice_description \

601 247 2 Nantucket Nectar [Pineapple Orange Banana]
2379 947 2 Nantucket Nectar [Peach Orange]

item_price
601 $6.78
2379 $6.78

[22]: #iloc: chọn cột và hàng mong muốn

[23]: df.iloc[3:5, :-1]# từ iloc có thể tách cột, xóa cột

[23]: order_id quantity item_name \

3 1 1 Chips and Tomatillo-Green Chili Salsa
4 2 2 Chicken Bowl

choice_description
3 NaN
4 [Tomatillo-Red Chili Salsa (Hot), [Black Beans…

[24]: df.iloc[3:5]

[24]: order_id quantity item_name \

3 1 1 Chips and Tomatillo-Green Chili Salsa
4 2 2 Chicken Bowl

choice_description item_price
3 NaN $2.39

5
4 [Tomatillo-Red Chili Salsa (Hot), [Black Beans… $16.98

[25]: df.iloc[3:5, -1] # chỉ hiển thị cột hàng index chọn vị trí index -1

[25]: 3 $2.39
4 $16.98
Name: item_price, dtype: object

0.0.1 Data Manipulation

[27]: df.item_price.dtype

[27]: dtype('O')

[28]: #hàm xử lí chuyển đổi type trong dataframe

#với Func: Apply

[31]: df["item_price"]=df["item_price"].apply(lambda x: x.replace("$"," "))

[37]: df["item_price"].dtype

[37]: dtype('O')

[33]: df.head()

[33]: order_id quantity item_name \

0 1 1 Chips and Fresh Tomato Salsa
1 1 1 Izze
2 1 1 Nantucket Nectar
3 1 1 Chips and Tomatillo-Green Chili Salsa
4 2 2 Chicken Bowl

choice_description item_price
0 NaN 2.39
1 [Clementine] 3.39
2 [Apple] 3.39
3 NaN 2.39
4 [Tomatillo-Red Chili Salsa (Hot), [Black Beans… 16.98

[30]: print(df.dtype)

---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_25428\910862699.py in ?()
----> 1 print(df.dtype)

6
C:\ProgramData\anaconda3\Lib\site-packages\pandas\core\generic.py in ?(self,␣
↪name)

6295 and name not in self._accessors

6296 and self._info_axis.
↪_can_hold_identifiers_and_holds_name(name)

6297 ):
6298 return self[name]
-> 6299 return object.__getattribute__(self, name)

AttributeError: 'DataFrame' object has no attribute 'dtype'

[ ]: df["item_price"] = df["item_price"].astype(float)

[ ]: correlative = df[["order_id", "item_price"]].corr()

print(correlative)

Bai2 Data - Pandas
No ratings yet
Bai2 Data - Pandas
11 pages
Pandas Practice
No ratings yet
Pandas Practice
18 pages
Pandas Tutorial
No ratings yet
Pandas Tutorial
7 pages
Pandas PD Scipy Matplotlib - Pyplot PLT Matplotlib - Ticker TK Numpy NP
No ratings yet
Pandas PD Scipy Matplotlib - Pyplot PLT Matplotlib - Ticker TK Numpy NP
6 pages
Database Schema for Food Orders
No ratings yet
Database Schema for Food Orders
3 pages
Get Lilog
No ratings yet
Get Lilog
5 pages
Exercise Data Analysis
No ratings yet
Exercise Data Analysis
25 pages
Nhom17 CSDL Tra Sua
No ratings yet
Nhom17 CSDL Tra Sua
3 pages
From Django
No ratings yet
From Django
4 pages
Pizza Sales Data Overview
No ratings yet
Pizza Sales Data Overview
27 pages
Sales Analysis and Regression Results
No ratings yet
Sales Analysis and Regression Results
2 pages
Data Analysis with Pandas Library
No ratings yet
Data Analysis with Pandas Library
8 pages
Order Analysis
No ratings yet
Order Analysis
10 pages
Answer Customer Order Line Customer - Or: Primary Primary Foregin
No ratings yet
Answer Customer Order Line Customer - Or: Primary Primary Foregin
1 page
baitapTH CSDL KCNTT SV
No ratings yet
baitapTH CSDL KCNTT SV
5 pages
Chi Tiêu Hộ Gia Đình 2020-2022
No ratings yet
Chi Tiêu Hộ Gia Đình 2020-2022
2 pages
Pandas
No ratings yet
Pandas
2 pages
TKMT 2
No ratings yet
TKMT 2
12 pages
Mo Ta Kich Ban Phan 2
No ratings yet
Mo Ta Kich Ban Phan 2
3 pages
MRA Project Milestone2 PDF
100% (1)
MRA Project Milestone2 PDF
1 page
Data Analysis Exercises for Beginners
No ratings yet
Data Analysis Exercises for Beginners
43 pages
SellingMgmt Script
No ratings yet
SellingMgmt Script
4 pages
Data Analysis with Pandas Guide
No ratings yet
Data Analysis with Pandas Guide
40 pages
Demo Database
No ratings yet
Demo Database
5 pages
CODE-R XSTK
No ratings yet
CODE-R XSTK
9 pages
Update Item
No ratings yet
Update Item
17 pages
Database
No ratings yet
Database
2 pages
Hồ Chí Minh Drink Menu
No ratings yet
Hồ Chí Minh Drink Menu
27 pages
2nd Part Customer Analysis
No ratings yet
2nd Part Customer Analysis
2 pages
Bảng Giá Cước và Thống Kê Giao Nhận
No ratings yet
Bảng Giá Cước và Thống Kê Giao Nhận
49 pages
Assignment 1
No ratings yet
Assignment 1
7 pages
TH C4 Nangcao
No ratings yet
TH C4 Nangcao
314 pages
TH C Hành Bu I 4
No ratings yet
TH C Hành Bu I 4
10 pages
Data Support20250204080233
No ratings yet
Data Support20250204080233
20 pages
Ex Anova 2 Way
No ratings yet
Ex Anova 2 Way
16 pages
LP 0727
No ratings yet
LP 0727
5 pages
Bài Tập SQL - Số 1
No ratings yet
Bài Tập SQL - Số 1
2 pages
Đề GK
No ratings yet
Đề GK
6 pages
Data Analysis for Sales Insights
No ratings yet
Data Analysis for Sales Insights
4 pages
Pandas DataFrame and Series Operations
No ratings yet
Pandas DataFrame and Series Operations
74 pages
9.9.24 Revision
No ratings yet
9.9.24 Revision
9 pages
03 Pandas
No ratings yet
03 Pandas
51 pages
Insert Inventory Cost
No ratings yet
Insert Inventory Cost
2 pages
HỒ ĐẮC LÂM
No ratings yet
HỒ ĐẮC LÂM
21 pages
Cơ Sở Dữ Liệu Tuần 8
No ratings yet
Cơ Sở Dữ Liệu Tuần 8
1 page
Code
No ratings yet
Code
17 pages
Standardqp
No ratings yet
Standardqp
4 pages
Siddhesh Asati: #Group: B (ML)
No ratings yet
Siddhesh Asati: #Group: B (ML)
9 pages
BigMart Sales Data Analysis
No ratings yet
BigMart Sales Data Analysis
16 pages
Bai Tap Co So Du Lieu 1
No ratings yet
Bai Tap Co So Du Lieu 1
4 pages
Mô Hình Graph:: !DOCTYPE RDF:RDF (
No ratings yet
Mô Hình Graph:: !DOCTYPE RDF:RDF (
12 pages
Python DataFrame Techniques
No ratings yet
Python DataFrame Techniques
10 pages
RDF and OWL Schema Examples
No ratings yet
RDF and OWL Schema Examples
7 pages
Pandas PD Numpy NP Matplotlib - Pyplot PLT Seaborn Sns DF PD - Read - CSV (, Encoding ) DF - Head
No ratings yet
Pandas PD Numpy NP Matplotlib - Pyplot PLT Seaborn Sns DF PD - Read - CSV (, Encoding ) DF - Head
31 pages
Code Thi Hieu
No ratings yet
Code Thi Hieu
1 page
Zomato Schema
No ratings yet
Zomato Schema
9 pages
1 Pandas Basics
No ratings yet
1 Pandas Basics
13 pages
Material Documents
No ratings yet
Material Documents
4 pages
Shoppe Phake
No ratings yet
Shoppe Phake
202 pages
Cooling Unit. Off Coil Temp Room Temp PDF
100% (1)
Cooling Unit. Off Coil Temp Room Temp PDF
5 pages
Unit 2 Economic Geology
No ratings yet
Unit 2 Economic Geology
20 pages
Peb Foundation
No ratings yet
Peb Foundation
6 pages
M.Tech Power Systems Curriculum VJTI
No ratings yet
M.Tech Power Systems Curriculum VJTI
50 pages
KISA ISC Preparatory Mathematics
No ratings yet
KISA ISC Preparatory Mathematics
8 pages
FMI-Specification-2 0 1
No ratings yet
FMI-Specification-2 0 1
128 pages
Advances in Engineering Materials: R. K. Tyagi Pallav Gupta Prosenjit Das Rajiv Prakash
No ratings yet
Advances in Engineering Materials: R. K. Tyagi Pallav Gupta Prosenjit Das Rajiv Prakash
377 pages
3 Seismologifundamental3
No ratings yet
3 Seismologifundamental3
202 pages
CL 1 Malocclusion 5TH DR Mouayad 20200531084732
No ratings yet
CL 1 Malocclusion 5TH DR Mouayad 20200531084732
92 pages
Everything Created in Pairs
No ratings yet
Everything Created in Pairs
1 page
Alcohols and Phenols Activity No. 13 Data Sheet I. Procedure and Observations: A. Alcohols 1. Solubility
No ratings yet
Alcohols and Phenols Activity No. 13 Data Sheet I. Procedure and Observations: A. Alcohols 1. Solubility
4 pages
IoT Engineer Profile & Experience
No ratings yet
IoT Engineer Profile & Experience
4 pages
Baking and Pastry Notes
No ratings yet
Baking and Pastry Notes
26 pages
Web Tech Programs & Solutions
No ratings yet
Web Tech Programs & Solutions
65 pages
Statistics Major-Minor Sem 2
No ratings yet
Statistics Major-Minor Sem 2
9 pages
Reservoir Connectivity Insights
No ratings yet
Reservoir Connectivity Insights
42 pages
Reviews: The Principles of Control and Stability of Aircraft. W. J. Duncan. Cambridge University
No ratings yet
Reviews: The Principles of Control and Stability of Aircraft. W. J. Duncan. Cambridge University
1 page
Chrysler PARTE1
No ratings yet
Chrysler PARTE1
35 pages
W2 W3-302E Long Products Bulletin 2016
No ratings yet
W2 W3-302E Long Products Bulletin 2016
120 pages
CS-201 1-1 Intr To Comp Programming (Revised)
No ratings yet
CS-201 1-1 Intr To Comp Programming (Revised)
7 pages
Grade 4 DLL Quarter 4 Week 1 (Sir Bien Cruz)
No ratings yet
Grade 4 DLL Quarter 4 Week 1 (Sir Bien Cruz)
46 pages
3.4 Design of Lined Canals
No ratings yet
3.4 Design of Lined Canals
22 pages
OKI MAnual
No ratings yet
OKI MAnual
1,267 pages
Binary Tree Traversal Guide
No ratings yet
Binary Tree Traversal Guide
7 pages
Lec#02 PDC - Design Aspects of A Process Control System
No ratings yet
Lec#02 PDC - Design Aspects of A Process Control System
19 pages
Ge1 Ge2-Gb
No ratings yet
Ge1 Ge2-Gb
3 pages
SNAP and NMAT PDF
No ratings yet
SNAP and NMAT PDF
15 pages
PhysioEx Exercise 1 Activity 2
No ratings yet
PhysioEx Exercise 1 Activity 2
3 pages
Flight Manual
100% (7)
Flight Manual
55 pages
Seminar Presentation
No ratings yet
Seminar Presentation
36 pages

Pandas Data Analysis Tutorial

Uploaded by

Pandas Data Analysis Tutorial

Uploaded by

pandas-tutorial

November 29, 2024

[7]: import pandas as pd

[8]: order_id quantity item_name \

[9]: df.info() # hiển thị thông tin từng cột một

[10]: ['order_id', 'quantity', 'item_name', 'choice_description', 'item_price']

[11]: RangeIndex(start=0, stop=4622, step=1)

[12]: #describe: return statistics summary

[12]: order_id quantity

[13]: df.describe(include = "all")

[13]: order_id quantity item_name choice_description item_price

[14]: df.describe(percentiles = [0.1,0.3,0.24,0.44] )

[14]: order_id quantity

Ilocation and Idex Location

[16]: order_id quantity item_name \

[17]: df.loc[(df.quantity == 15) | (df.item_name == "Nantucket Nectar")]

[17]: order_id quantity item_name \

[18]: #loc: chỉ chọn hàng mong muốn

[19]: order_id quantity item_name

[20]: df.loc[(df.quantity ==2) & (df.item_name == "Nantucket Nectar")]

[20]: order_id quantity item_name choice_description \

[21]: df.loc[(df.quantity >= 2) & (df.item_name == "Nantucket Nectar")]

[21]: order_id quantity item_name choice_description \

[22]: #iloc: chọn cột và hàng mong muốn

[23]: df.iloc[3:5, :-1]# từ iloc có thể tách cột, xóa cột

[23]: order_id quantity item_name \

[24]: order_id quantity item_name \

0.0.1 Data Manipulation

[28]: #hàm xử lí chuyển đổi type trong dataframe

[31]: df["item_price"]=df["item_price"].apply(lambda x: x.replace("$"," "))

[33]: order_id quantity item_name \

6295 and name not in self._accessors

AttributeError: 'DataFrame' object has no attribute 'dtype'

[ ]: correlative = df[["order_id", "item_price"]].corr()

You might also like