Chapter 1: Python Pandas
Introduction & Need for Pandas
Pandas is a fast, powerful, and easy-to-use open-source data analysis and manipulation library
built on top of Python. It provides two main data structures: Series and DataFrame, which make
data cleaning, analysis, and visualization easier.
Series & DataFrame
• Series: A one-dimensional labeled array that can hold any data type. • DataFrame: A
two-dimensional labeled data structure with columns of potentially different types.
Difference Between Series and DataFrame
Aspect Series DataFrame
Dimension 1■D 2■D
Structure Single column with index Multiple rows & columns
Creation From list, array, scalar From dict of lists, list of dicts, CSV etc.
Index Single index Row & Column index
Use Case Handle single column data Handle full tabular data
Creating Series & DataFrame
• From List: [Link]([10, 20, 30]) • From Dictionary: [Link]({'a':10, 'b':20}) • From Dictionary of
Lists: [Link]({'Name':['A','B'],'Marks':[80,90]}) • From CSV: pd.read_csv('[Link]')
Indexing & Selection
• loc[] → Label-based selection • iloc[] → Position-based selection • Boolean indexing: df[df['Marks']
> 50]
Handling Missing Data
• dropna(): Removes rows/columns with missing values • fillna(): Fills missing values with given
value or method
Adding/Deleting Columns & Sorting
• Add Column: df['NewCol'] = data • Delete Column: [Link]('ColumnName', axis=1, inplace=True) •
Sort: df.sort_values(by='Column')
Aggregation & GroupBy
• Aggregate Functions: sum(), mean(), median(), mode(), std(), count(), min(), max() • GroupBy
Example: [Link]('City')['Marks'].mean()
Descriptive Statistics
• [Link]() → Provides count, mean, std, min, max, and quartiles for numerical columns.
Data Visualization
• Line Plot: df['col'].plot(kind='line') • Bar Plot: df['col'].plot(kind='bar') • Histogram:
df['col'].plot(kind='hist') • Box Plot: [Link]()
Important Functions Table
Operation Method / Function
Read CSV pd.read_csv('[Link]')
Write CSV df.to_csv('[Link]')
Drop missing [Link]()
Fill missing [Link](value=...)
Sort by value df.sort_values(by='column_name')
Group by [Link]('column_name').agg(...)
Plot histogram df['column'].hist() or [Link](df['column'])
Plot bar chart [Link](x, y)
Tips & Common Errors
• Always check for missing values before performing operations. • axis=0 → row-wise, axis=1 →
column-wise operations. • Pay attention to inplace=True while dropping columns (modifies original
DataFrame).
PYQs (Past Year Questions)
1. Differentiate between Series and DataFrame with example. 2. Write a program to create a
DataFrame with columns Name and Marks, and display rows with Marks > 50. 3. How can you
handle missing values in a DataFrame? 4. Write the Python statement to group data by City and
find average Age. 5. Which function is used to display basic statistics of a DataFrame?