Unit 1: Data Handling using Pandas & Data
Visualisation
Key Concepts
• Pandas is a Python library for data manipulation & analysis.
• Series: 1■D labeled array (like a single column).
• DataFrame: 2■D table (like an Excel sheet with rows + columns).
Basic Difference Between Series and DataFrame
Aspect Series DataFrame
Dimension 1■D 2■D
Structure Single column with index Multiple rows & columns
Creation From list, array, scalar From dict of lists, list of dicts, CSV etc.
Use Case Handle single column data Handle full tabular data
Important Methods / Formulae
Operation Method / Function
Read CSV pd.read_csv('filename.csv')
Write CSV df.to_csv('filename.csv')
Drop missing df.dropna()
Fill missing df.fillna(value=...)
Sort by value df.sort_values(by='column_name')
Group by df.groupby('column_name').agg(...)
Plot histogram df['column'].hist() or plt.hist(df['column'])
Plot bar chart plt.bar(x, y)
Things to Remember
• Always check for missing values before processing data.
• axis=0 → operation on rows, axis=1 → operation on columns.
• Use histogram for distribution, bar plot for category comparison, box plot for outliers.
Sample PYQs
1. What methods in Pandas can be used to handle missing values? Explain with examples.
2. Given a DataFrame, how do you select rows by label vs position?
3. Plot histograms for a column, and explain what skewness means in that plot.
4. Given a CSV file with student data, write queries to sort, group by subject, and find the average
marks.