0% found this document useful (0 votes)
16 views6 pages

? Pandas Study Guide

The Pandas Study Guide provides an overview of essential Pandas functionalities, including importing data, exploring DataFrames, selecting and filtering data, and performing aggregation and mathematical operations. It covers practical examples and tips for data cleaning, creating new columns, and managing missing values. Key methods such as .loc, .iloc, and groupby() are highlighted for effective data manipulation.

Uploaded by

tazshakas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views6 pages

? Pandas Study Guide

The Pandas Study Guide provides an overview of essential Pandas functionalities, including importing data, exploring DataFrames, selecting and filtering data, and performing aggregation and mathematical operations. It covers practical examples and tips for data cleaning, creating new columns, and managing missing values. Key methods such as .loc, .iloc, and groupby() are highlighted for effective data manipulation.

Uploaded by

tazshakas
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

📘 Pandas Study Guide

📘 Pandas Study Guide


🔧 1. Pandas Basics
a. Importing Pandas

import pandas as pd

Always start with this line to use pandas.

b. Reading CSV Files

df = pd.read_csv("url_or_path")

# Example with dictionary


data = {
'regiment': ['Nighthawks', 'Dragoons', 'Scouts'],
'deaths': [523, 234, 62],
'origin': ['Arizona', 'Iowa', 'Oregon']
}
df = pd.DataFrame(data)

# Example with tab-separated data


url = 'https://raw.githubusercontent.com/justmarkham/DAT8/master/data/chipotle.tsv
chipo = pd.read_csv(url, sep='\t')

Loads a CSV file into a DataFrame.


Use local paths: pd.read_csv("data.csv") .

🧾 2. DataFrame Exploration
a. Displaying Rows

df.head() # First 5 rows


df.tail() # Last 5 rows

b. Basic Info

1/6
📘 Pandas Study Guide

df.info() # Summary (columns, nulls, types)


df.describe() # Statistical summary for numeric columns

c. Shape and Columns

df.shape # Tuple of (rows, columns)


df.columns # List of column names

d. Sorting Data

df.sort_values(by='Goals', ascending=False) # Sort by Goals, descending


df.sort_values(by=['Group', 'Goals'], ascending=[True, False]) # Multi-column so

sort_values : Sorts the DataFrame by one or more columns.


ascending=True for ascending, False for descending.

🔍 3. Selecting and Filtering


a. Selecting Columns

df['Goals'] # Single column (returns Series)


df[['Team', 'Goals']] # Multiple columns (returns DataFrame)

b. Filtering Rows

df[df['Goals'] > 5] # Filter by condition


df[df['Team'] == 'Germany'] # Exact match
df[df['Team'].str.startswith('G')] # Filter rows where Team starts with 'G'

str.startswith : Filters string columns based on prefix (case-sensitive).

c. Conditional with isin

df[df['Team'].isin(['Germany', 'Spain'])]

d. Selecting with .loc (Label-based)

2/6
📘 Pandas Study Guide

df.loc[0] # Select row by index label


df.loc[0:2, ['Team', 'Goals']] # Select rows 0-2 and specific columns
df.loc[df['Goals'] > 5, 'Team'] # Select Team column where Goals > 5

.loc : Access rows and columns by labels or conditions.

e. Selecting with .iloc (Index-based)

df.iloc[0] # Select first row


df.iloc[0:3, 1:3] # Select rows 0-2 and columns 1-2

.iloc : Access rows and columns by integer positions.

📐 4. Aggregation and Grouping


a. Using len() to Count

num_teams = len(df)

b. Grouping

df.groupby('Group')['Goals'].mean() # Mean goals per group

c. Applying Functions with .apply

df['Goals_Doubled'] = df['Goals'].apply(lambda x: x * 2) # Double each Goals valu


df['Team_Category'] = df['Team'].apply(lambda x: 'Elite' if x in ['Germany', 'Spai

.apply : Applies a function to each element in a Series or DataFrame.

🧮 5. Math and Stats


df['Goals'].sum() # Total
df['Goals'].mean() # Average
df['Goals'].max() # Max value

3/6
📘 Pandas Study Guide

✂️6. Column Operations


a. Creating New Columns

df['Goals per Match'] = df['Goals'] / df['Matches Played']

b. Renaming Columns

df.rename(columns={'old': 'new'}, inplace=True)

c. Setting Index

df.set_index('Team', inplace=True) # Set Team column as index


df.reset_index(inplace=True) # Reset index to default

set_index : Sets a column as the DataFrame index.


reset_index : Reverts index to default integer index.

🧼 7. Data Cleaning
a. Checking for Missing Values

df.isnull().sum()

b. Dropping Columns or Rows

df.drop(columns=['Red Cards'], inplace=True)


df.dropna(inplace=True)

c. Removing Duplicates

df.drop_duplicates(subset=['Team'], keep='first', inplace=True)

drop_duplicates : Removes duplicate rows based on specified columns.


subset : Columns to check for duplicates.
keep='first' : Keeps first occurrence; use last or False for other behaviors.
4/6
📘 Pandas Study Guide

📚 Examples You Should Try


# 1. Get teams with more than 6 goals
df[df['Goals'] > 6]

# 2. Find number of teams in the dataset


len(df)

# 3. Get top 3 teams with most yellow cards


df.sort_values(by='Yellow Cards', ascending=False).head(3)

# 4. Teams that received no red cards


df[df['Red Cards'] == 0]

# 5. Create a new column: 'Goals per Match'


df['Goals per Match'] = df['Goals'] / df['Matches Played']

# 6. Filter teams starting with 'S'


df[df['Team'].str.startswith('S')]

# 7. Select first 2 rows and 'Team', 'Goals' columns using .loc


df.loc[0:1, ['Team', 'Goals']]

# 8. Select first 2 rows and first 2 columns using .iloc


df.iloc[0:2, 0:2]

# 9. Double goals using .apply


df['Goals_Doubled'] = df['Goals'].apply(lambda x: x * 2)

# 10. Remove duplicate teams


df.drop_duplicates(subset=['Team'], keep='first')

# 11. Set 'Team' as index and select rows where Goals > 5
df.set_index('Team').loc[df['Goals'] > 5]

✅ Tips to Remember
df[df['col'] > value] is your main tool for filtering.
Use .head() to preview data often.
Pandas treats missing values ( NaN ) differently — always check with .isnull().sum() .
Column operations ( df['new'] = ... ) let you engineer features quickly.
groupby() is powerful for grouped aggregation (like finding averages by category).

5/6
📘 Pandas Study Guide
Use .loc for label-based selection, .iloc for position-based selection.
sort_values helps organize data; combine with head() for top/bottom rows.
.apply is flexible for custom transformations.
Use drop_duplicates to ensure unique data entries.
set_index is useful for making a column the index for easier lookups.

6/6

You might also like