📘 Pandas Study Guide
📘 Pandas Study Guide
🔧 1. Pandas Basics
a. Importing Pandas
import pandas as pd
Always start with this line to use pandas.
b. Reading CSV Files
df = pd.read_csv("url_or_path")
# Example with dictionary
data = {
'regiment': ['Nighthawks', 'Dragoons', 'Scouts'],
'deaths': [523, 234, 62],
'origin': ['Arizona', 'Iowa', 'Oregon']
}
df = pd.DataFrame(data)
# Example with tab-separated data
url = 'https://raw.githubusercontent.com/justmarkham/DAT8/master/data/chipotle.tsv
chipo = pd.read_csv(url, sep='\t')
Loads a CSV file into a DataFrame.
Use local paths: pd.read_csv("data.csv") .
🧾 2. DataFrame Exploration
a. Displaying Rows
df.head() # First 5 rows
df.tail() # Last 5 rows
b. Basic Info
1/6
📘 Pandas Study Guide
df.info() # Summary (columns, nulls, types)
df.describe() # Statistical summary for numeric columns
c. Shape and Columns
df.shape # Tuple of (rows, columns)
df.columns # List of column names
d. Sorting Data
df.sort_values(by='Goals', ascending=False) # Sort by Goals, descending
df.sort_values(by=['Group', 'Goals'], ascending=[True, False]) # Multi-column so
sort_values : Sorts the DataFrame by one or more columns.
ascending=True for ascending, False for descending.
🔍 3. Selecting and Filtering
a. Selecting Columns
df['Goals'] # Single column (returns Series)
df[['Team', 'Goals']] # Multiple columns (returns DataFrame)
b. Filtering Rows
df[df['Goals'] > 5] # Filter by condition
df[df['Team'] == 'Germany'] # Exact match
df[df['Team'].str.startswith('G')] # Filter rows where Team starts with 'G'
str.startswith : Filters string columns based on prefix (case-sensitive).
c. Conditional with isin
df[df['Team'].isin(['Germany', 'Spain'])]
d. Selecting with .loc (Label-based)
2/6
📘 Pandas Study Guide
df.loc[0] # Select row by index label
df.loc[0:2, ['Team', 'Goals']] # Select rows 0-2 and specific columns
df.loc[df['Goals'] > 5, 'Team'] # Select Team column where Goals > 5
.loc : Access rows and columns by labels or conditions.
e. Selecting with .iloc (Index-based)
df.iloc[0] # Select first row
df.iloc[0:3, 1:3] # Select rows 0-2 and columns 1-2
.iloc : Access rows and columns by integer positions.
📐 4. Aggregation and Grouping
a. Using len() to Count
num_teams = len(df)
b. Grouping
df.groupby('Group')['Goals'].mean() # Mean goals per group
c. Applying Functions with .apply
df['Goals_Doubled'] = df['Goals'].apply(lambda x: x * 2) # Double each Goals valu
df['Team_Category'] = df['Team'].apply(lambda x: 'Elite' if x in ['Germany', 'Spai
.apply : Applies a function to each element in a Series or DataFrame.
🧮 5. Math and Stats
df['Goals'].sum() # Total
df['Goals'].mean() # Average
df['Goals'].max() # Max value
3/6
📘 Pandas Study Guide
✂️6. Column Operations
a. Creating New Columns
df['Goals per Match'] = df['Goals'] / df['Matches Played']
b. Renaming Columns
df.rename(columns={'old': 'new'}, inplace=True)
c. Setting Index
df.set_index('Team', inplace=True) # Set Team column as index
df.reset_index(inplace=True) # Reset index to default
set_index : Sets a column as the DataFrame index.
reset_index : Reverts index to default integer index.
🧼 7. Data Cleaning
a. Checking for Missing Values
df.isnull().sum()
b. Dropping Columns or Rows
df.drop(columns=['Red Cards'], inplace=True)
df.dropna(inplace=True)
c. Removing Duplicates
df.drop_duplicates(subset=['Team'], keep='first', inplace=True)
drop_duplicates : Removes duplicate rows based on specified columns.
subset : Columns to check for duplicates.
keep='first' : Keeps first occurrence; use last or False for other behaviors.
4/6
📘 Pandas Study Guide
📚 Examples You Should Try
# 1. Get teams with more than 6 goals
df[df['Goals'] > 6]
# 2. Find number of teams in the dataset
len(df)
# 3. Get top 3 teams with most yellow cards
df.sort_values(by='Yellow Cards', ascending=False).head(3)
# 4. Teams that received no red cards
df[df['Red Cards'] == 0]
# 5. Create a new column: 'Goals per Match'
df['Goals per Match'] = df['Goals'] / df['Matches Played']
# 6. Filter teams starting with 'S'
df[df['Team'].str.startswith('S')]
# 7. Select first 2 rows and 'Team', 'Goals' columns using .loc
df.loc[0:1, ['Team', 'Goals']]
# 8. Select first 2 rows and first 2 columns using .iloc
df.iloc[0:2, 0:2]
# 9. Double goals using .apply
df['Goals_Doubled'] = df['Goals'].apply(lambda x: x * 2)
# 10. Remove duplicate teams
df.drop_duplicates(subset=['Team'], keep='first')
# 11. Set 'Team' as index and select rows where Goals > 5
df.set_index('Team').loc[df['Goals'] > 5]
✅ Tips to Remember
df[df['col'] > value] is your main tool for filtering.
Use .head() to preview data often.
Pandas treats missing values ( NaN ) differently — always check with .isnull().sum() .
Column operations ( df['new'] = ... ) let you engineer features quickly.
groupby() is powerful for grouped aggregation (like finding averages by category).
5/6
📘 Pandas Study Guide
Use .loc for label-based selection, .iloc for position-based selection.
sort_values helps organize data; combine with head() for top/bottom rows.
.apply is flexible for custom transformations.
Use drop_duplicates to ensure unique data entries.
set_index is useful for making a column the index for easier lookups.
6/6