0% found this document useful (0 votes)

33 views25 pages

Pandas

Uploaded by

inet.free.all

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views25 pages

Pandas

Uploaded by

inet.free.all

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Intr

o
Structur
e

Last week topic Today lecture

1. Why Pandas 5. DataFrame
2. Name 6. Basic Visualization
3. Index
4. Filtering
Topic 1: Pandas
Dataframe

• Simplistically, a data frame is a table, with rows and columns.

• Each column in a data frame is a series object.
• Rows consist of elements inside series.

Name Age City

Alice 24 New York

Bob 30 San Francisco This is a Dataframe

Charlie 22 Los Angeles

An example table
Topic 1: Pandas
Dataframe
Creating a DataFrame from a Dictionary

You can create a DataFrame from a dictionary where keys are column names
and values are lists of data:

data = {'A': [1, 2, 3], 'B': [4, 5, 6]}

df = pd.DataFrame(data)

Output:
A B
0 1 4
1 2 5
2 3 6
Topic 1: Pandas
Dataframe
Creating a DataFrame from a List of Dictionaries

Each dictionary in the list represents a row of data:

data = [{'A': 1, 'B': 4}, {'A': 2, 'B': 5}]

df = pd.DataFrame(data)

Output:
A B
0 1 4
1 2 5
2 3 6
Topic 1: Pandas
Dataframe
Reading Data from a CSV File

Use Pandas to read data from a CSV file with pd.read_csv():

df = pd.read_csv('file.csv')

Output: CSV file content:

A B A, B,
0 1 4 1, 4,
1 2 5 2, 5,
2 3 6 3, 6,
Topic 1: Pandas
Example data

• Data decribes the info of G7 countries

• Each row is a country, each column is properties of that country
 We will start using pandas to wrangling the data.

G7 Stats

Population GDP Surface HDI Continent

Canada 35.467 1,785,387.00 9,984,670 0.913 America
France 63.951 2,833,687.00 640,679 0.888 Europe
Germany 80.94 3,874,437.00 357,114 0.916 Europe
Italy 60.665 2,167,744.00 301,336 0.873 Europe
Japan 127.061 4,602,367.00 377,930 0.891 Asia
United Kingdom 64.511 2,950,039.00 242,495 0.907 Europe
United States 318.523 17,348,075.00 9,525,067 0.915 America
Topic 1: Pandas
Dataframe
Reading Data from a CSV File

Creating `DataFrame`s manually can be tedious. 99% of the time you'll be

pulling the data from a Database, a csv file or the web.

df = pd.DataFrame({'Population': […], 'GDP': […],

'Surface Area': […],…}, columns=['Population', 'GDP',
…])
Topic 1: Pandas
Dataframe
Dataframe Index

DataFrame`s also have indexes. As you can see in the "table" above,
pandas has assigned auto-incremental index. We can reassign that to
country name:
df.index = ['Canada','France','Germany','Italy',…]
df
Topic 1: Pandas
Dataframe
Dataframe axes

You can show the table first row and columns by using df.columns and
df.index
df.column
s
df.index
Output:
Index(['Population', 'GDP', 'Surface Area', 'HDI',
'Continent'], dtype='object')

Index(['Canada', 'France', 'Germany', 'Italy', 'Japan', 'United

Kingdom','United States'], dtype='object')
Topic 1: Pandas
Dataframe
Dataframe basic info

You can show basic information of a table by using function df.info()

df.info()

Any null value in the series

Datatype

Why this Dtype is object?

Topic 1: Pandas
Dataframe
Dataframe basic info

You can also table’s statistical information:

df.size(), df.info(),
df.describe()

Mean, standard
deviation and number
of elements

1st, 2nd , … quartiles

What does it mean?
Topic 1: Pandas
Dataframe
Indexing, Selection and Slicing

Individual columns can be selected with regular indexing. The result is

a Series df.loc['Canada'], df.iloc[0]

Output:
Population 35.467
GDP 1785387
Surface Area 9984670
HDI 0.913
Continent America
Name: Canada,
dtype: object
Topic 1: Pandas
Dataframe
Indexing, Selection and Slicing

Multiple columns can be selected similarly. The result is a Dataframe

df[['Population', 'GDP']]

Output:
Topic 1: Pandas
Dataframe
Indexing, Selection and Slicing

Slicing works differently, it acts at "row level", and can be counter

intuitive:
df[1:3]
Output:
Topic 1: Pandas
Dataframe
Indexing, Selection and Slicing

Row level selection works better with loc and iloc. which are
recommended over regular "direct slicing" df[ : ].

loc selects rows matching the iloc selects rows with numeric
given index: position of the index:
df.loc['France': 'Italy', df.iloc[[0, 1, -1]]
'Population']
Topic 1: Pandas
Dataframe
Conditional selection

Conditional selection works the same way as the Series:

df['Population'] > 70

Output:
Canada False
France False
Germany True
Italy False
Japan True
United Kingdom False
United States True
Name: Population,
dtype: bool
Topic 1: Pandas
Dataframe
Conditional selection

Conditional selection works the same way as the Series:

df.loc[df['Population'] > 70]

Output:
Topic 1: Pandas
Dataframe
Remove stuff

Remove some element, for row when axis=0 and for columns when
axis=1, you can also use axis=“rows” or axis=“columns”:

df.drop(['Italy', 'Canada'],
axis=0)
Output:
Topic 1: Pandas
Dataframe
Adding value

You can add new column by add a Series, missing value automatically
filled with NaN:

df['Language'] = pd.Series(['French', 'German', 'Italian']

index=['France', 'Germany', 'Italy'],name='Language')
Topic 1: Pandas
Dataframe
Adding value

You can add new column by add a Series, missing value automatically
filled with NaN:

df['Language'] = pd.Series(['French', 'German', 'Italian']

index=['France', 'Germany', 'Italy'],name='Language')
Topic 1: Pandas
Dataframe
Adding value

You can add new row similarly:

df = df.append(pd.Series({'Population': 3, 'GDP': 5},
name='China'))
Topic 1: Pandas
Dataframe
Basic visualization

You can show the relationship between the population and GDP:
df.plot(kind='scatter', x='Population', y='GDP',
title='Population vs GDP')
Topic 1: Pandas
Dataframe
Basic visualization

With combination of barplot with pandas function:

df.groupby('Continent')[['Population','GDP']].sum().plot(
kind='bar', title='Total Population and GDP by Continent')
Topic 1: Pandas
Dataframe
Basic visualization

Or show the correlation of each column:

df[['Population', 'GDP',
'Surface Area', 'HDI']].corr()

 What are some insights from

this correlation?

Data Handling Using Pandas-1
No ratings yet
Data Handling Using Pandas-1
60 pages
Lab-3 Pandas Library
No ratings yet
Lab-3 Pandas Library
14 pages
Pandas
No ratings yet
Pandas
13 pages
Mastering Pandas: A Comprehensive Guide
No ratings yet
Mastering Pandas: A Comprehensive Guide
13 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
16 pages
Data Wrangling with Pandas Guide
No ratings yet
Data Wrangling with Pandas Guide
16 pages
Pandas Basics Cheat Sheet Guide
No ratings yet
Pandas Basics Cheat Sheet Guide
1 page
Pandas Complete Notes
No ratings yet
Pandas Complete Notes
105 pages
Data Analysis With Pandas
No ratings yet
Data Analysis With Pandas
122 pages
Lab-3 Pandas Library
No ratings yet
Lab-3 Pandas Library
18 pages
05 Pandas Data Frames
No ratings yet
05 Pandas Data Frames
33 pages
WEBINTEL GUIDED LAB ACTIVITY Introduction To Pandas
No ratings yet
WEBINTEL GUIDED LAB ACTIVITY Introduction To Pandas
1 page
Pandas DataFrame Basics Guide
No ratings yet
Pandas DataFrame Basics Guide
9 pages
CSL 410 L17
No ratings yet
CSL 410 L17
27 pages
Python 3rd Unit Question and Answer
No ratings yet
Python 3rd Unit Question and Answer
25 pages
Exp1 - Manipulating Datasets Using Pandas
No ratings yet
Exp1 - Manipulating Datasets Using Pandas
15 pages
Pandas Basics Cheat Sheet Guide
No ratings yet
Pandas Basics Cheat Sheet Guide
1 page
Pandas - Cheat - Sheet (1) - 240511 - 113437
No ratings yet
Pandas - Cheat - Sheet (1) - 240511 - 113437
1 page
Pandas Handbook
No ratings yet
Pandas Handbook
33 pages
PandasGUIA PYTHON-04
No ratings yet
PandasGUIA PYTHON-04
1 page
For Assignment-3 (Final - Pandas - Lab)
No ratings yet
For Assignment-3 (Final - Pandas - Lab)
40 pages
Overview of Pandas DataFrames
No ratings yet
Overview of Pandas DataFrames
21 pages
UNIT II Notes
No ratings yet
UNIT II Notes
23 pages
Python Pandas DataFrame Guide
No ratings yet
Python Pandas DataFrame Guide
53 pages
Introduction to Pandas Library in Python
No ratings yet
Introduction to Pandas Library in Python
39 pages
Python Pandas DataFrame Guide
No ratings yet
Python Pandas DataFrame Guide
4 pages
DataFrame Ac Win Final
No ratings yet
DataFrame Ac Win Final
30 pages
Pandas Tutorial
No ratings yet
Pandas Tutorial
33 pages
Understanding Pandas Data Structures
No ratings yet
Understanding Pandas Data Structures
56 pages
Pandas DataFrame1
No ratings yet
Pandas DataFrame1
22 pages
Pandas Module Overview and Usage Guide
No ratings yet
Pandas Module Overview and Usage Guide
15 pages
Pandas
No ratings yet
Pandas
21 pages
Python Pandas for Data Science
No ratings yet
Python Pandas for Data Science
59 pages
Pandas Guide
No ratings yet
Pandas Guide
50 pages
Unit-4Introduction To Pandas
No ratings yet
Unit-4Introduction To Pandas
44 pages
Create Empty DataFrame in Pandas
No ratings yet
Create Empty DataFrame in Pandas
16 pages
Pandas Notes
No ratings yet
Pandas Notes
20 pages
Unit 3
No ratings yet
Unit 3
10 pages
Pandas Notes
No ratings yet
Pandas Notes
44 pages
Pandas
No ratings yet
Pandas
26 pages
Data Analysis with Pandas & Matplotlib
No ratings yet
Data Analysis with Pandas & Matplotlib
8 pages
Introduction to Pandas Basics
No ratings yet
Introduction to Pandas Basics
6 pages
Dataframes-I (Create - Selection)
No ratings yet
Dataframes-I (Create - Selection)
12 pages
Introduction to Pandas DataFrames
No ratings yet
Introduction to Pandas DataFrames
25 pages
Pandas
No ratings yet
Pandas
20 pages
SBLC 1
No ratings yet
SBLC 1
23 pages
Unit 04 Pandas
No ratings yet
Unit 04 Pandas
46 pages
Unit 04 Pandas
No ratings yet
Unit 04 Pandas
46 pages
Pandas
No ratings yet
Pandas
4 pages
Pandas DataFrame: Syntax and Usage
No ratings yet
Pandas DataFrame: Syntax and Usage
70 pages
1-Python Pandas Case Study
No ratings yet
1-Python Pandas Case Study
25 pages
Pandas DataFrame Guide for Informatics
No ratings yet
Pandas DataFrame Guide for Informatics
11 pages
Unit 4
No ratings yet
Unit 4
36 pages
IP 12th Chapter 3
No ratings yet
IP 12th Chapter 3
9 pages
Panda 1
No ratings yet
Panda 1
18 pages
Subject IP
No ratings yet
Subject IP
9 pages
0 - Creating Reading and Writing
No ratings yet
0 - Creating Reading and Writing
4 pages
Lecture No 5 Business Law
No ratings yet
Lecture No 5 Business Law
23 pages
Gas Coning
No ratings yet
Gas Coning
2 pages
Drews 2013
No ratings yet
Drews 2013
7 pages
Ruby On Rails Interview Questions and Answers
No ratings yet
Ruby On Rails Interview Questions and Answers
2 pages
Klausula CyberIT
No ratings yet
Klausula CyberIT
1 page
Fasteners: v. Aug. 7, 2011
No ratings yet
Fasteners: v. Aug. 7, 2011
3 pages
Introduction to Management Information Systems
No ratings yet
Introduction to Management Information Systems
20 pages
Graph Transformations Practice
No ratings yet
Graph Transformations Practice
15 pages
George Soros: The Central Banks' Secret Weapon (EIR Economics, Vol. 20 No. 28 - Published July 23 1993)
100% (2)
George Soros: The Central Banks' Secret Weapon (EIR Economics, Vol. 20 No. 28 - Published July 23 1993)
2 pages
Top Vulnerable Applications - 2010
No ratings yet
Top Vulnerable Applications - 2010
5 pages
Geocentric Datum of Australia Technical Manual
No ratings yet
Geocentric Datum of Australia Technical Manual
62 pages
Chapter 6 - Temperature Measurements
No ratings yet
Chapter 6 - Temperature Measurements
16 pages
M500 2019 10years Free Ebook
No ratings yet
M500 2019 10years Free Ebook
329 pages
601005-0002 A Imm Xpi Opguide en
No ratings yet
601005-0002 A Imm Xpi Opguide en
412 pages
2059 s14 Ms 1 PDF
No ratings yet
2059 s14 Ms 1 PDF
11 pages
Case Analysis Sonoco Products Company
No ratings yet
Case Analysis Sonoco Products Company
18 pages
Robotics & Automation Curriculum
No ratings yet
Robotics & Automation Curriculum
12 pages
Applied Physics Course Overview
No ratings yet
Applied Physics Course Overview
35 pages
Robin Singh DEL-YYZ 23APR
No ratings yet
Robin Singh DEL-YYZ 23APR
5 pages
Imc Model
No ratings yet
Imc Model
10 pages
HOVAL Scheda Tecnica Caldaia Ultragas 35 650
No ratings yet
HOVAL Scheda Tecnica Caldaia Ultragas 35 650
54 pages
Understanding Dividend Types and Payments
0% (2)
Understanding Dividend Types and Payments
4 pages
Huawei Antenna Solution Overview - Maxis Update PDF
100% (2)
Huawei Antenna Solution Overview - Maxis Update PDF
32 pages
Chapter 01
No ratings yet
Chapter 01
14 pages
Wilo Pump Selection Guide 2021
No ratings yet
Wilo Pump Selection Guide 2021
88 pages
Knife Gate Valve: Product Description
No ratings yet
Knife Gate Valve: Product Description
2 pages
Foundation Design
No ratings yet
Foundation Design
39 pages
Basics of Conflict Management
No ratings yet
Basics of Conflict Management
5 pages
TRC - DOA Mornach Signed RBC Bank 27 Mar 2017ben-1
50% (2)
TRC - DOA Mornach Signed RBC Bank 27 Mar 2017ben-1
20 pages
StatementOfAccount 6956708897 24052024 114120
No ratings yet
StatementOfAccount 6956708897 24052024 114120
24 pages

Pandas

Uploaded by

Pandas

Uploaded by

Intr

Last week topic Today lecture

• Simplistically, a data frame is a table, with rows and columns.

Name Age City

Bob 30 San Francisco This is a Dataframe

Charlie 22 Los Angeles

data = {'A': [1, 2, 3], 'B': [4, 5, 6]}

Each dictionary in the list represents a row of data:

data = [{'A': 1, 'B': 4}, {'A': 2, 'B': 5}]

Use Pandas to read data from a CSV file with pd.read_csv():

Output: CSV file content:

• Data decribes the info of G7 countries

Population GDP Surface HDI Continent

Creating `DataFrame`s manually can be tedious. 99% of the time you'll be

df = pd.DataFrame({'Population': […], 'GDP': […],

Index(['Canada', 'France', 'Germany', 'Italy', 'Japan', 'United

You can show basic information of a table by using function df.info()

Any null value in the series

Why this Dtype is object?

You can also table’s statistical information:

1st, 2nd , … quartiles

Individual columns can be selected with regular indexing. The result is

Multiple columns can be selected similarly. The result is a Dataframe

Slicing works differently, it acts at "row level", and can be counter

Conditional selection works the same way as the Series:

Conditional selection works the same way as the Series:

df['Language'] = pd.Series(['French', 'German', 'Italian']

df['Language'] = pd.Series(['French', 'German', 'Italian']

You can add new row similarly:

With combination of barplot with pandas function:

Or show the correlation of each column:

 What are some insights from

You might also like