Pandas Detailed Notes
What is Pandas?
• Pandas is a Python library for handling structured data (rows & columns).
• Built on top of NumPy for fast computations.
• Main structures: Series (1D) and DataFrame (2D).
• Used for importing, cleaning, analyzing, and visualizing datasets.
• Installation: pip install pandas
• Import: import pandas as pd
Importing Data
• From Local Computer (Colab): [Link]() + pd.read_csv()
• From Google Drive: [Link]('/content/drive') + pd.read_csv(path)
• From URL: pd.read_csv(url)
• From Excel: pd.read_excel('[Link]')
Understanding DataFrames
• DataFrame = tabular structure with rows & columns.
• Attributes: index, columns, shape, size, ndim, dtypes, memory_usage.
• Example: [Link] → (rows, cols), [Link] → column types.
Viewing Data
• [Link](n): First n rows
• [Link](n): Last n rows
• [Link](): Summary (types, nulls, memory)
• [Link](): Stats summary for numeric columns
Indexing & Selection
• Select column: df['Name']
• Select multiple columns: df[['Name','Age']]
• Select row: [Link][0] or [Link][0]
• Row+Column: [Link][0,1], [Link][0,'Age']
• Conditional: df[df['Age']>25]
Data Types
• Numeric: int64, float64
• Character/String: object
• Category: for fixed repeated values
• Check: [Link], Convert: df['col'].astype('float')
Cleaning Data
• Replace: df['Doors'].replace({'three':3,'four':4}, inplace=True)
• Missing values: [Link]().sum(), [Link]()
• Fill missing numeric: df['Age'].fillna(df['Age'].mean())
• Fill missing categorical: df['FuelType'].fillna(df['FuelType'].mode()[0])
• Drop missing: [Link](), Drop duplicates: df.drop_duplicates()
Functions & Control Structures
• Python if-else, loops, and functions can be used with DataFrames.
• Example: binning prices into Low/Medium/High using function and apply().
• def price_class(x): return 'Low' if x<10000 else 'High'
Exploratory Data Analysis (EDA)
• Frequency tables: [Link](df['FuelType'], columns='count')
• Two-way tables: [Link](df['FuelType'], df['Automatic'])
• Joint Probability: normalize=True
• Marginal Probability: normalize='all'
• Conditional Probability: normalize='index'
• Correlation: [Link](method='pearson')
Data Visualization with Matplotlib
• Scatter plot: [Link](df['Age'], df['Price'])
• Histogram: [Link](df['KM'], bins=10)
• Bar plot: df['FuelType'].value_counts().plot(kind='bar')
Data Visualization with Seaborn
• [Link](x='Age', y='Price', data=df, hue='FuelType')
• [Link](df['Age'], bins=20, kde=True)
• [Link](x='FuelType', y='Price', data=df)
• [Link](df, hue='FuelType')
Exporting Data
• Save to CSV: df.to_csv('[Link]', index=False)
• Save to Excel: df.to_excel('[Link]', index=False)
Summary
• 1. Load data (CSV, Excel, Google Drive).
• 2. Explore with head(), info(), describe().
• 3. Clean: replace, fillna, dropna, convert types.
• 4. Analyze: groupby, crosstab, correlation.
• 5. Visualize: Matplotlib & Seaborn plots.
• 6. Export with to_csv(), to_excel().