0% found this document useful (0 votes)
34 views2 pages

Pandas Notes

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views2 pages

Pandas Notes

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Pandas Detailed Notes

What is Pandas?
• Pandas is a Python library for handling structured data (rows & columns).
• Built on top of NumPy for fast computations.
• Main structures: Series (1D) and DataFrame (2D).
• Used for importing, cleaning, analyzing, and visualizing datasets.
• Installation: pip install pandas
• Import: import pandas as pd

Importing Data
• From Local Computer (Colab): [Link]() + pd.read_csv()
• From Google Drive: [Link]('/content/drive') + pd.read_csv(path)
• From URL: pd.read_csv(url)
• From Excel: pd.read_excel('[Link]')

Understanding DataFrames
• DataFrame = tabular structure with rows & columns.
• Attributes: index, columns, shape, size, ndim, dtypes, memory_usage.
• Example: [Link] → (rows, cols), [Link] → column types.

Viewing Data
• [Link](n): First n rows
• [Link](n): Last n rows
• [Link](): Summary (types, nulls, memory)
• [Link](): Stats summary for numeric columns

Indexing & Selection


• Select column: df['Name']
• Select multiple columns: df[['Name','Age']]
• Select row: [Link][0] or [Link][0]
• Row+Column: [Link][0,1], [Link][0,'Age']
• Conditional: df[df['Age']>25]

Data Types
• Numeric: int64, float64
• Character/String: object
• Category: for fixed repeated values
• Check: [Link], Convert: df['col'].astype('float')

Cleaning Data
• Replace: df['Doors'].replace({'three':3,'four':4}, inplace=True)
• Missing values: [Link]().sum(), [Link]()
• Fill missing numeric: df['Age'].fillna(df['Age'].mean())
• Fill missing categorical: df['FuelType'].fillna(df['FuelType'].mode()[0])
• Drop missing: [Link](), Drop duplicates: df.drop_duplicates()

Functions & Control Structures


• Python if-else, loops, and functions can be used with DataFrames.
• Example: binning prices into Low/Medium/High using function and apply().
• def price_class(x): return 'Low' if x<10000 else 'High'

Exploratory Data Analysis (EDA)


• Frequency tables: [Link](df['FuelType'], columns='count')
• Two-way tables: [Link](df['FuelType'], df['Automatic'])
• Joint Probability: normalize=True
• Marginal Probability: normalize='all'
• Conditional Probability: normalize='index'
• Correlation: [Link](method='pearson')

Data Visualization with Matplotlib


• Scatter plot: [Link](df['Age'], df['Price'])
• Histogram: [Link](df['KM'], bins=10)
• Bar plot: df['FuelType'].value_counts().plot(kind='bar')

Data Visualization with Seaborn


• [Link](x='Age', y='Price', data=df, hue='FuelType')
• [Link](df['Age'], bins=20, kde=True)
• [Link](x='FuelType', y='Price', data=df)
• [Link](df, hue='FuelType')

Exporting Data
• Save to CSV: df.to_csv('[Link]', index=False)
• Save to Excel: df.to_excel('[Link]', index=False)

Summary
• 1. Load data (CSV, Excel, Google Drive).
• 2. Explore with head(), info(), describe().
• 3. Clean: replace, fillna, dropna, convert types.
• 4. Analyze: groupby, crosstab, correlation.
• 5. Visualize: Matplotlib & Seaborn plots.
• 6. Export with to_csv(), to_excel().

You might also like