0% found this document useful (0 votes)
3 views4 pages

Pandasnotes

The document provides structured notes on using Pandas for data science, covering installation, DataFrame creation, data loading, accessing and modifying data, filtering, handling missing data, and more. It includes practical examples and code snippets for various operations like aggregating data, creating pivot tables, and applying functions. Additionally, it offers options for converting the notes into PDF format.

Uploaded by

Njan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views4 pages

Pandasnotes

The document provides structured notes on using Pandas for data science, covering installation, DataFrame creation, data loading, accessing and modifying data, filtering, handling missing data, and more. It includes practical examples and code snippets for various operations like aggregating data, creating pivot tables, and applying functions. Additionally, it offers options for converting the notes into PDF format.

Uploaded by

Njan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

numpynotes.

md 2025-05-17

Awesome, you're now learning both NumPy and Pandas — killer combo for Data Science ⚡

Here’s your clean and structured Pandas course notes so far (formatted in Markdown — can be converted
to PDF anytime):

🐼 Pandas Full Course Notes


📦 1. Getting Started
Use pip install pandas
Import with import pandas as pd
Start in Google Colab, JupyterLab, or VS Code

🧱 2. DataFrames: The Core of Pandas

data = [[1, 2, 3], [4, 5, 6]]


df = pd.DataFrame(data, columns=['A', 'B', 'C'])

📋 DataFrame Attributes

df.shape # Rows x Columns


df.columns # Column names
df.index # Row index
df.dtypes # Data types
df.info() # Summary info
df.describe() # Stats summary

📂 3. Loading Data

pd.read_csv('file.csv')
pd.read_excel('file.xlsx')
pd.read_parquet('file.parquet')

You can load from URLs too.

🔍 4. Accessing Data

df.head(), df.tail(), df.sample()


df['column'] # Access column
1/4
numpynotes.md 2025-05-17

df[['col1', 'col2']] # Multiple columns


df.loc[rows, cols] # By label
df.iloc[rows, cols] # By index
df.at[row, 'col'] # Single value
df.iat[row, col] # Fastest single access

✏ 5. Modifying Data

df.loc[0, 'column'] = 99
df['new'] = df['a'] + df['b']
df.drop(columns=['old_col'], inplace=True)
df.rename(columns={'old': 'new'}, inplace=True)
df.copy() # Avoid shared memory issues

🧠 6. Filtering Data

df[df['age'] > 30]


df[(df['age'] > 30) & (df['city'] == 'NY')]
df.query("age > 30 and city == 'NY'")
df[df['name'].str.contains('John', case=False)]

📅 7. String & Date Functions

df['first_name'] = df['name'].str.split().str[0]
df['birth_date'] = pd.to_datetime(df['birth_date'])
df['year'] = df['birth_date'].dt.year
df['is_leap'] = df['birth_date'].dt.is_leap_year

🛠 8. Handling Missing Data

df.isna(), df.notna()
df.fillna(value)
df.interpolate()
df.dropna(subset=['column'], inplace=True)

📊 9. Aggregating Data

2/4
numpynotes.md 2025-05-17

df['col'].value_counts()
df.groupby('col').sum()
df.groupby('col').agg({'price': 'mean', 'qty': 'sum'})

🧾 10. Pivot Tables

df.pivot(index='day', columns='type', values='sales')

🔄 11. Merging & Concatenating

pd.merge(df1, df2, on='id', how='left')


pd.concat([df1, df2], axis=0)

🚨 12. Advanced Functions

df['shifted'] = df['sales'].shift(1)
df['cumsum'] = df['sales'].cumsum()
df['rank'] = df['sales'].rank()
df['rolling_mean'] = df['sales'].rolling(3).mean()

🧪 13. Apply & Lambda

df['category'] = df['height'].apply(lambda x: 'Tall' if x > 180 else 'Short')

Or with a custom function:

def classify(row):
if row['height'] > 180 and row['weight'] > 80:
return 'Heavy'
return 'Light'

df['type'] = df.apply(classify, axis=1)

💾 14. Save Your Work


3/4
numpynotes.md 2025-05-17

df.to_csv('data.csv', index=False)
df.to_excel('data.xlsx')
df.to_parquet('data.parquet')

If you want this in PDF, I can:

1. Convert it to HTML or LaTeX (for offline editing)


2. Send you a Python script that auto-generates the PDF
3. Help you paste it into a markdown-to-pdf tool

Which way works best for you? Want me to do the same style for your Pandas + NumPy combo PDF?

4/4

You might also like