0% found this document useful (0 votes)

13 views12 pages

Unit III - Notes

Uploaded by

Kannan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views12 pages

Unit III - Notes

Uploaded by

Kannan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

VELTECHHIGHTECH

Dr.RANGARAJANDr.SAKUNTHALAENGINEERINGCOLLEGE
AnAutonomousInstitution
ApprovedbyAICTE-NewDelhi,AffiliatedtoAnnaUniversity,Chennai
AccreditedbyNBA,NewDelhi&AccreditedbyNAACwith“A”Grade&CGPAof3.27
Courseco 21AI35IT Semester III
de
Category PROFESSIONALCORECOURSE(PCC) L T P C
2 0 4 4
CourseTitl DATASCIENCEFORENGINEERS
e

COURSEOBJECTIVES:
 To describe the life cycle of Data Science and computational
environments for data scientists using Python.
 To describe the fundamentals for exploring and managing data
with Python.
 To examine the various data analytics techniques for labeled
/columnar data using Python.
 To demonstrate a flexible range of data visualizations techniques
in Python.
 To describe the various Machine learning algorithms for data
modeling with Python.

COURSEOUTCOMES:
CO. No. Course Outcomes Blooms
level
On successful completion of this Course, students will be able to
C305. 3 Understand the concepts of Pandas. K2

UNIT-III
UNITIII INTRODUCTION TO PANDAS
Installing and Using Pandas, Introducing Pandas Objects,Data IndexingandSelection.Operating
on Data in Pandas, Handling Missing Data.

INTRODUCTION TO PANDAS
Pandas in Python is a package that is written for data analysis and manipulation. Pandas offer
various operations and data structures to perform numerical data manipulations and time
series. Pandas is an open-source library that is built over Numpy libraries. Pandas library is
known for its high productivity and high performance. Pandas are popular because they
make importing and analyzing data much easier. Pandas programs can be written on any
plain text editor like Notepad, notepad++, or anything of that sort and saved with a .py
extension.

To begin with Install Pandas in Python, write Pandas Codes, and perform various intriguing
and useful operations, one must have Python installed on their System. Check if Python is
Already Present To check if your device is pre-installed with Python or not, just go to the
Command line(search for cmd in the Run dialog( + R). Now run the following command:
python –version

If Python is already installed, it will generate a message with the Python version available
else install Python, for installing please visit: How to Install Python on Windows or Linux
and PIP.

Pandas can be installed in multiple ways on Windows, Linux, and MacOS. Various ways are
listed below:

Import Pandas in Python

Now, that we have installed pandas on the system. Let's see how we can import it to make
use of it.

For this, go to a Jupyter Notebook or open a Python file, and write the following code:
import pandas as pd

Here, pd is referred to as an alias to the Pandas, which will help us in optimizing the code.

How to Install or Download Python Pandas

Pandas can be installed in multiple ways on Windows, Linux and MacOS. Various different
ways are listed below:

Install Pandas on Windows

Python Pandas can be installed on Windows in two ways:

Using pip

Using Anaconda

Install Pandas using pip

PIP is a package management system used to install and manage software packages/libraries
written in Python. These files are stored in a large “online repository” termed as Python
Package Index (PyPI).

Step 1 : Launch Command Prompt

To open the Start menu, press the Windows key or click the Start button. To access the
Command Prompt, type "cmd" in the search bar, click the displayed app, or use Windows
key + r, enter "cmd," and press Enter.

Step 2 : Run the Command

Pandas can be installed using PIP by use of the following command in Command Prompt.
pip install pandas
Introduction to Pandas

 Pandas is a Python library used for data analysis and manipulation.

 It builds on NumPy and provides easy-to-use data structures for labeled data.

🧩 2. Core Pandas Data Structures

Pandas provides two primary objects:

🟦 A. Series – 1D Labeled Array

 A Series is a one-dimensional array-like object that can hold any data type.
 It includes an index which labels each element.

🔹 Syntax:
python
CopyEdit
import pandas as pd

s = pd.Series([10, 20, 30], index=['a', 'b', 'c'])

print(s)
🔹 Output:
css
CopyEdit
a 10
b 20
c 30
dtype: int64
🔹 Key Properties:

 s.index – returns the index (['a', 'b', 'c'])

 s.values – returns the data ([10, 20, 30])
🔹 Use Cases:

 Time series data

 Single-column data
 Intermediate results in calculations

🟨 B. DataFrame – 2D Labeled Table

 A DataFrame is a 2-dimensional labeled data structure with rows and columns.

 Think of it like a table or spreadsheet.

🔹 Syntax:
python
CopyEdit
data = {
'Name': ['Alice', 'Bob'],
'Age': [25, 30]
}

df = pd.DataFrame(data)
print(df)
🔹 Output:
markdown
CopyEdit
Name Age
0 Alice 25
1 Bob 30
🔹 Key Properties:

 df.columns – returns column labels (['Name', 'Age'])

 df.index – returns row index ([0, 1])
 df.values – returns 2D array of values

🔹 Accessing Data:
python
CopyEdit
df['Name'] # Access column
df.loc[0] # Access row by label
df.iloc[1] # Access row by position

🧠 3. Differences Between Series and DataFrame

Feature Series DataFrame
Dimension 1D 2D
Data structure Array with index Table with rows and columns
Use Case One column of data Tabular data

📌 4. Creating Pandas Objects from Various Data Types

Source Type Constructor Used Example
List or array pd.Series() pd.Series([1, 2, 3])
Dictionary pd.DataFrame() pd.DataFrame({'a':[1], 'b':[2]})
NumPy array pd.DataFrame() pd.DataFrame(np.random.rand(2,3))
Source Type Constructor Used Example
CSV/Excel/SQL pd.read_csv(), etc. pd.read_csv('file.csv')

✅ 5. Summary

 Series: 1D data with labels

 DataFrame: 2D data with row and column labels
 Pandas simplifies loading, transforming, and analyzing structured data

1. Indexing in Series
A Series in Pandas can be indexed by:

 Position (like lists)

 Label (like dictionaries)

✅ Example:
python
CopyEdit
import pandas as pd
s = pd.Series([100, 200, 300], index=['a', 'b', 'c'])

🔹 Accessing Elements:
python
CopyEdit
s['a'] # 100
s[1] # 200

🔹 Slicing:
python
CopyEdit
s['a':'c'] # Includes both start and end
s[0:2] # Like list slicing

2. Indexing in DataFrame
A DataFrame supports:

 Column selection
 Row selection
 Element access
 Boolean indexing

✅ Example:
python
CopyEdit
data = {'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35]}
df = pd.DataFrame(data)

🟦 A. Column Selection
python
CopyEdit
df['Name'] # Single column (as Series)
df[['Name', 'Age']] # Multiple columns (as DataFrame)

B. Row Selection
🔹 Using loc[] (Label-based):
python
CopyEdit
df.loc[0] # First row
df.loc[0:1] # Rows 0 to 1 (inclusive)
df.loc[df['Age'] > 25] # Filtered rows
🔹 Using iloc[] (Integer-based):
python
CopyEdit
df.iloc[1] # Second row
df.iloc[0:2] # First two rows

C. Accessing Individual Elements

python
CopyEdit
df.loc[0, 'Name'] # Alice
df.iloc[1, 0] # Bob

3. Boolean Indexing / Conditional Selection

Used to filter rows based on condition(s):

python
CopyEdit
df[df['Age'] > 25]
✅ Example:
python
CopyEdit
# Output rows where Age > 25
print(df[df['Age'] > 25])

4. Using Conditions with Multiple Filters

 Use & for AND, | for OR
 Enclose each condition in parentheses

python
CopyEdit
df[(df['Age'] > 25) & (df['Name'] != 'Bob')]

📌 Summary Table
Method Description Example
df['col'] Access a column df['Name']
Method Description Example
df.loc[] Label-based row/column access df.loc[0, 'Age']
df.iloc[] Integer-location based access df.iloc[1, 0]
Boolean Indexing Conditional row selection df[df['Age'] > 30]
Slicing Range of rows/columns df[0:2], df.loc[1:2]

Operating on Data in Pandas

This topic covers how to perform arithmetic, statistical, and functional operations on
data stored in Pandas Series and DataFrame objects.

1. Element-wise Operations
Pandas supports element-wise arithmetic operations between:

 Series and Series

 DataFrame and DataFrame
 DataFrame and scalar value

Example:

import pandas as pd

df = pd.DataFrame({

'A': [10, 20, 30],

'B': [5, 15, 25]

})

# Add 5 to each element

print(df + 5)

# Multiply column by 2

print(df['A'] * 2)
# Subtract columns

print(df['A'] - df['B'])

2. Statistical and Aggregation Functions

Pandas provides built-in functions to summarize and describe data.

Function Description
sum() Sum of values
mean() Average/mean
median() Median value
std() Standard deviation
min() Minimum value
max() Maximum value
count() Count non-null values
describe() Summary statistics

Example:
python
CopyEdit
df.sum() # Column-wise sum
df.mean() # Column-wise mean
df.describe() # Summary of statistics

3. Function Application
You can apply custom or built-in functions to:

 A column (Series)
 The entire DataFrame

Using apply():
python
CopyEdit
# Square each value in column A
df['A'].apply(lambda x: x ** 2)

Using applymap() (for element-wise DataFrame ops):

python
CopyEdit
df.applymap(lambda x: x * 2)

4. Broadcasting
Broadcasting allows arithmetic between objects of different shapes:

 Series aligns on index

 Scalar applies to every element

Example:
python
CopyEdit
s = pd.Series([1, 2, 3])
df = pd.DataFrame({'A': [10, 20, 30], 'B': [40, 50, 60]})

# Broadcast Series across DataFrame columns

df.sub(s, axis=0)

🎯 5. Sorting and Ranking

🔹 Sorting:
python
CopyEdit
df.sort_values(by='A') # Sort rows by column A
df.sort_index() # Sort by index

🔹 Ranking:
python
CopyEdit
df['A'].rank() # Assign rank to values

🧪 6. Data Type Operations

You can check and convert data types using:

python
CopyEdit
df.dtypes # Check column data types
df.astype(int) # Convert data type

📌 Summary Table
Operation Type Method / Function Example
Arithmetic +, -, *, / df['A'] + 5
Aggregation sum(), mean(), etc. df.mean()
Apply function apply(), applymap() df.apply(lambda x: x*2)
Sorting & Ranking sort_values(), rank() df.sort_values('A')
Type Conversion astype() df.astype('float')

Handling Missing Data in Pandas

Objective:

Learn how to identify, remove, and fill missing values (NaNs) in Series and DataFrame
objects using Pandas.

1. What is Missing Data?

 Missing data is represented as:
o NaN (Not a Number)
o None (Python's null value)
 Common causes:
o Incomplete data entries
o Failed data imports
o Data corruption

🔍 2. Detecting Missing Data

✅ Use isnull() and notnull()
python
CopyEdit
import pandas as pd
import numpy as np

df = pd.DataFrame({
'Name': ['Alice', 'Bob', None],
'Age': [25, np.nan, 30]
})

print(df.isnull()) # Returns True for missing values

print(df.notnull()) # Returns True for non-missing values

🧹 3. Dropping Missing Data

🔹 dropna() — Remove rows or columns with NaN values
✅ Drop rows with any missing value:
python
CopyEdit
df.dropna()
✅ Drop columns with any missing value:
python
CopyEdit
df.dropna(axis=1)
✅ Drop rows where all values are NaN:
python
CopyEdit
df.dropna(how='all')
✅ Drop rows if less than a threshold of non-NaN values:
python
CopyEdit
df.dropna(thresh=2)

🧴 4. Filling Missing Data

🔹 fillna() — Replace NaNs with a value or method
✅ Replace with a constant:
python
CopyEdit
df.fillna(0)
df.fillna('Unknown')
✅ Forward Fill (propagate last valid value forward):
python
CopyEdit
df.fillna(method='ffill')
✅ Backward Fill:
python
CopyEdit
df.fillna(method='bfill')
✅ Fill using column mean/median:
python
CopyEdit
df['Age'].fillna(df['Age'].mean(), inplace=True)

🧪 5. Interpolating Missing Data

Estimates missing values using mathematical interpolation:

python
CopyEdit
df.interpolate()

🧠 6. Checking for Any Missing Data

python
CopyEdit
df.isnull().sum() # Count of missing values per column
df.isnull().any() # Check if any missing value exists

📝 Example: Handling Missing Data

python
CopyEdit
data = {'Name': ['Alice', None, 'Charlie'], 'Age': [25, None, 30]}
df = pd.DataFrame(data)

# Fill missing name with 'Unknown', and Age with average

df['Name'].fillna('Unknown', inplace=True)
df['Age'].fillna(df['Age'].mean(), inplace=True)

📌 Summary Table
Task Method Example
Detect missing data isnull(), notnull() df.isnull()
Drop rows with missing data dropna() df.dropna()
Fill missing data fillna(value) df.fillna(0)
Forward/backward fill method='ffill'/'bfill' df.fillna(method='ffill')
Interpolation interpolate() df.interpolate()
Count missing values isnull().sum() df.isnull().sum()

FDS Module 2 Notes
No ratings yet
FDS Module 2 Notes
24 pages
FDS Exp4
No ratings yet
FDS Exp4
5 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
16 pages
Introduction to Pandas Library in Python
No ratings yet
Introduction to Pandas Library in Python
39 pages
Introduction to Pandas for Data Analysis
No ratings yet
Introduction to Pandas for Data Analysis
12 pages
Pandas Notes
No ratings yet
Pandas Notes
20 pages
Introduction to Pandas Basics
No ratings yet
Introduction to Pandas Basics
6 pages
Pandas DataFrame Basics Guide
No ratings yet
Pandas DataFrame Basics Guide
32 pages
Unit 3
No ratings yet
Unit 3
10 pages
Introduction to Pandas for Data Science
No ratings yet
Introduction to Pandas for Data Science
14 pages
Python Pandas Tutorial For Beginners
100% (1)
Python Pandas Tutorial For Beginners
203 pages
Introduction to Pandas Library
No ratings yet
Introduction to Pandas Library
31 pages
Subject IP
No ratings yet
Subject IP
9 pages
Introduction to Python Pandas Basics
No ratings yet
Introduction to Python Pandas Basics
21 pages
Pandas
No ratings yet
Pandas
7 pages
Install and Use Pandas in Python
No ratings yet
Install and Use Pandas in Python
25 pages
Module 6
No ratings yet
Module 6
48 pages
Introduction to Python Pandas Library
No ratings yet
Introduction to Python Pandas Library
44 pages
Mdad - Numpy ML
No ratings yet
Mdad - Numpy ML
85 pages
Starting Out With Pandas - Ext
No ratings yet
Starting Out With Pandas - Ext
18 pages
Lab-3 Pandas Library
No ratings yet
Lab-3 Pandas Library
18 pages
DevOps Session 3 Pandas
No ratings yet
DevOps Session 3 Pandas
33 pages
Data Manipulation with Pandas Basics
No ratings yet
Data Manipulation with Pandas Basics
36 pages
Data Handling Using Pandas-1
No ratings yet
Data Handling Using Pandas-1
60 pages
NumPy and Pandas for Data Analysis
No ratings yet
NumPy and Pandas for Data Analysis
34 pages
Pandas Series - Notes For PA3
No ratings yet
Pandas Series - Notes For PA3
9 pages
UNIT II Notes
No ratings yet
UNIT II Notes
23 pages
Pandas
No ratings yet
Pandas
25 pages
Pandas Handbook
No ratings yet
Pandas Handbook
33 pages
Pandas
No ratings yet
Pandas
13 pages
UNIT - 3 Pandas
No ratings yet
UNIT - 3 Pandas
21 pages
Introduction to Pandas for Data Analysis
No ratings yet
Introduction to Pandas for Data Analysis
81 pages
Pandas Shan Ver2
No ratings yet
Pandas Shan Ver2
25 pages
Python Pandas
No ratings yet
Python Pandas
2 pages
Practical Guide To Pandas For Data Science
100% (1)
Practical Guide To Pandas For Data Science
26 pages
Introduction to Pandas Library Basics
No ratings yet
Introduction to Pandas Library Basics
6 pages
Introduction to Pandas for Data Analysis
No ratings yet
Introduction to Pandas for Data Analysis
4 pages
Pandas Tutorial
No ratings yet
Pandas Tutorial
33 pages
Pandas Notes
No ratings yet
Pandas Notes
44 pages
Python Pandas: Data Manipulation Guide
No ratings yet
Python Pandas: Data Manipulation Guide
84 pages
Creating DataFrames with Python Pandas
No ratings yet
Creating DataFrames with Python Pandas
5 pages
Pandas DataFrame Basics Guide
No ratings yet
Pandas DataFrame Basics Guide
9 pages
Lecture 5
No ratings yet
Lecture 5
36 pages
Data Manipulation With Pandas
100% (1)
Data Manipulation With Pandas
138 pages
Pandas Questions
No ratings yet
Pandas Questions
11 pages
Phan1 Pandas Numpy Matplotlib
No ratings yet
Phan1 Pandas Numpy Matplotlib
158 pages
Eda U2
No ratings yet
Eda U2
61 pages
Introduction to Python Pandas Library
No ratings yet
Introduction to Python Pandas Library
22 pages
Practical - 3 (Ai)
No ratings yet
Practical - 3 (Ai)
12 pages
Pandas Data Structures and Operations
No ratings yet
Pandas Data Structures and Operations
36 pages
For Assignment-3 (Final - Pandas - Lab)
No ratings yet
For Assignment-3 (Final - Pandas - Lab)
40 pages
Pandas Module Overview and Usage Guide
No ratings yet
Pandas Module Overview and Usage Guide
15 pages
Pandas Basics For Data Science
No ratings yet
Pandas Basics For Data Science
2 pages
Pandas for Machine Learning Guide
No ratings yet
Pandas for Machine Learning Guide
6 pages
Learn Complete Pandas With Real World Interviews Questions
No ratings yet
Learn Complete Pandas With Real World Interviews Questions
40 pages
Dilip PP
No ratings yet
Dilip PP
9 pages
Pandas Basics: Data Structures & Features
No ratings yet
Pandas Basics: Data Structures & Features
30 pages
Pandas
No ratings yet
Pandas
26 pages
Cadastral Map Preparation Report: Nekemte
No ratings yet
Cadastral Map Preparation Report: Nekemte
44 pages
Assignment Vac PDF
No ratings yet
Assignment Vac PDF
10 pages
Manual de Configuracion Cam Alarm ADC-V722W-Install-Guide PDF
No ratings yet
Manual de Configuracion Cam Alarm ADC-V722W-Install-Guide PDF
1 page
FIB-Privacy Policy and Regulatory Compliance-181124-073200
No ratings yet
FIB-Privacy Policy and Regulatory Compliance-181124-073200
23 pages
Web Page Creation Teaching Guide
100% (1)
Web Page Creation Teaching Guide
3 pages
AERobot An Affordable One Robot Per Student
No ratings yet
AERobot An Affordable One Robot Per Student
7 pages
Konica Minolta bizhub 4700P Series Overview
No ratings yet
Konica Minolta bizhub 4700P Series Overview
4 pages
Hammad Khan: Oct 2022 - May 2026 M.S. Ramaiah Institute of Technology Bangalore, Karnataka
No ratings yet
Hammad Khan: Oct 2022 - May 2026 M.S. Ramaiah Institute of Technology Bangalore, Karnataka
2 pages
Turing Machines and Undecidability Concepts
No ratings yet
Turing Machines and Undecidability Concepts
15 pages
ActionTracker Bak
No ratings yet
ActionTracker Bak
3,358 pages
BiSS-C Interface for Orbis Users
No ratings yet
BiSS-C Interface for Orbis Users
8 pages
C Programs for UNIX System Calls
No ratings yet
C Programs for UNIX System Calls
23 pages
Hotel Management Billing Computer Project
No ratings yet
Hotel Management Billing Computer Project
15 pages
Ad Hoc and Wireless Sensor Networks Course
No ratings yet
Ad Hoc and Wireless Sensor Networks Course
123 pages
Automated Insect Monitoring System
No ratings yet
Automated Insect Monitoring System
3 pages
EIM 11 Q1 - Module2 Managing Organizational Communication For Student
100% (1)
EIM 11 Q1 - Module2 Managing Organizational Communication For Student
26 pages
Contributor App User Guide: 3 Days Ago
No ratings yet
Contributor App User Guide: 3 Days Ago
18 pages
Color Coded
100% (1)
Color Coded
21 pages
Unit 4 Random Number Generators
No ratings yet
Unit 4 Random Number Generators
25 pages
Memory Hierarchy Notes
No ratings yet
Memory Hierarchy Notes
4 pages
MAD Microproject 2 by Campusify
No ratings yet
MAD Microproject 2 by Campusify
39 pages
PL2303RA USB To Serial Bridge Controller (With Built-In RS232 Transceiver) Product Datasheet
No ratings yet
PL2303RA USB To Serial Bridge Controller (With Built-In RS232 Transceiver) Product Datasheet
24 pages
Letter of Certification - Shirazi (1 Apr 2023 Signed)
No ratings yet
Letter of Certification - Shirazi (1 Apr 2023 Signed)
2 pages
Homology of SL 2
No ratings yet
Homology of SL 2
21 pages
Math Notebook 10ADV Term1 Mariam Bint Sultan School 2024 2025 Compressed
No ratings yet
Math Notebook 10ADV Term1 Mariam Bint Sultan School 2024 2025 Compressed
122 pages
Ez Win Answer Codm
No ratings yet
Ez Win Answer Codm
65 pages
Besfnd301c U4 Ex
No ratings yet
Besfnd301c U4 Ex
97 pages
Two-Bit Comparator Design Using Hybrid Logic
No ratings yet
Two-Bit Comparator Design Using Hybrid Logic
5 pages
Amrod Data Feed v2 Developer Guide
No ratings yet
Amrod Data Feed v2 Developer Guide
17 pages
Onestream Broadband & Phone Price Guide
No ratings yet
Onestream Broadband & Phone Price Guide
131 pages

Unit III - Notes

Uploaded by

Unit III - Notes

Uploaded by

VELTECHHIGHTECH

Import Pandas in Python

How to Install or Download Python Pandas

Install Pandas on Windows

Python Pandas can be installed on Windows in two ways:

Install Pandas using pip

Step 1 : Launch Command Prompt

Step 2 : Run the Command

 Pandas is a Python library used for data analysis and manipulation.

🧩 2. Core Pandas Data Structures

Pandas provides two primary objects:

🟦 A. Series – 1D Labeled Array

s = pd.Series([10, 20, 30], index=['a', 'b', 'c'])

 s.index – returns the index (['a', 'b', 'c'])

 Time series data

🟨 B. DataFrame – 2D Labeled Table

 A DataFrame is a 2-dimensional labeled data structure with rows and columns.

 df.columns – returns column labels (['Name', 'Age'])

🧠 3. Differences Between Series and DataFrame

📌 4. Creating Pandas Objects from Various Data Types

 Series: 1D data with labels

 Position (like lists)

C. Accessing Individual Elements

3. Boolean Indexing / Conditional Selection

4. Using Conditions with Multiple Filters

Operating on Data in Pandas

 Series and Series

'A': [10, 20, 30],

'B': [5, 15, 25]

# Add 5 to each element

2. Statistical and Aggregation Functions

Using applymap() (for element-wise DataFrame ops):

 Series aligns on index

# Broadcast Series across DataFrame columns

🎯 5. Sorting and Ranking

🧪 6. Data Type Operations

Handling Missing Data in Pandas

1. What is Missing Data?

🔍 2. Detecting Missing Data

print(df.isnull()) # Returns True for missing values

🧹 3. Dropping Missing Data

🧴 4. Filling Missing Data

🧪 5. Interpolating Missing Data

🧠 6. Checking for Any Missing Data

📝 Example: Handling Missing Data

# Fill missing name with 'Unknown', and Age with average

You might also like