Pandas
Handout (Approx. 5 Pages)
1. Introduction to Pandas
Pandas is a powerful open-source Python library used for data manipulation, analysis,
and cleaning. It provides high-level data structures and tools that make working with
structured data easy and efficient. Pandas is built on top of NumPy and is a core library in
data science, statistics, machine learning, and business analytics.
Pandas is especially useful when working with tabular data such as CSV files, Excel
spreadsheets, SQL tables, and time-series data.
2. Installing and Importing Pandas
2.1 Installation
Pandas can be installed using the Python package manager pip:
pip install pandas
2.2 Importing Pandas
The standard convention is to import Pandas with the alias pd:
import pandas as pd
This alias is commonly used in almost all Pandas-based programs.
3. Core Data Structures in Pandas
3.1 Series
A Series is a one-dimensional labeled array capable of holding data of any type.
data = [10, 20, 30, 40]
series = [Link](data)
print(series)
Key features of Series:
• One-dimensional
• Indexed
• Supports different data types
3.2 DataFrame
A DataFrame is a two-dimensional data structure similar to a table in a database or an
Excel spreadsheet.
data = {
'Name': ['Ali', 'Sara', 'John'],
'Age': [20, 22, 21],
'Department': ['CS', 'IT', 'SE']
}
df = [Link](data)
print(df)
DataFrames are the most commonly used Pandas structure.
4. Loading and Saving Data
4.1 Reading Data from Files
df = pd.read_csv('[Link]')
df = pd.read_excel('[Link]')
4.2 Writing Data to Files
df.to_csv('[Link]', index=False)
df.to_excel('[Link]', index=False)
Pandas also supports reading from and writing to SQL databases.
5. Exploring and Inspecting Data
Common methods for exploring data:
print([Link]())
print([Link]())
print([Link]())
print([Link]())
These methods help understand the structure, data types, and summary statistics of datasets.
6. Data Selection and Indexing
6.1 Selecting Columns
print(df['Name'])
6.2 Selecting Rows
print([Link][0])
print([Link][1])
6.3 Conditional Selection
print(df[df['Age'] > 20])
7. Data Cleaning and Handling Missing Values
Real-world data often contains missing or incorrect values.
7.1 Checking for Missing Values
print([Link]().sum())
7.2 Handling Missing Values
[Link](0, inplace=True)
[Link](inplace=True)
Data cleaning is one of the most important uses of Pandas.
8. Data Analysis and Operations
8.1 Sorting Data
df.sort_values(by='Age', ascending=True)
8.2 Grouping Data
[Link]('Department')['Age'].mean()
8.3 Applying Functions
df['Age_plus_1'] = df['Age'].apply(lambda x: x + 1)
9. Pandas with Databases and Other Libraries
Pandas integrates well with:
• NumPy – numerical computing
• Matplotlib / Seaborn – data visualization
• SQL databases – data storage and retrieval
• Scikit-learn – machine learning
Example: Reading data from MySQL
pd.read_sql(query, connection)
10. Applications of Pandas
Pandas is widely used in:
• Data analysis and reporting
• Business intelligence
• Financial analysis
• Scientific research
• Machine learning preprocessing
• Database reporting systems