0% found this document useful (0 votes)
2 views3 pages

Pandas Cheatsheet

This document is a comprehensive cheatsheet for Python Pandas, covering setup, core objects, data creation, inspection, selection, filtering, and manipulation techniques. It includes methods for handling missing data, sorting, grouping, reshaping, merging, and working with dates, strings, and categorical data. Additionally, it provides performance tips and quick plotting and exporting options.

Uploaded by

rabyasana
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views3 pages

Pandas Cheatsheet

This document is a comprehensive cheatsheet for Python Pandas, covering setup, core objects, data creation, inspection, selection, filtering, and manipulation techniques. It includes methods for handling missing data, sorting, grouping, reshaping, merging, and working with dates, strings, and categorical data. Additionally, it provides performance tips and quick plotting and exporting options.

Uploaded by

rabyasana
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Python Pandas (pd) Cheatsheet — Quick Reference (2–3

pages)

Setup
import pandas as pd
import numpy as np
pd.__version__

Core Objects
# Series: 1D labeled array
s = pd.Series([10, 20, 30], index=['a','b','c'])

# DataFrame: 2D labeled, tabular


df = pd.DataFrame({'A':[1,2,3], 'B':[4,5,6]})

Create / Read Data


pd.DataFrame(data) # from dict/array
pd.read_csv('file.csv') # CSV
pd.read_excel('file.xlsx', sheet_name=0) # Excel
pd.read_json('file.json') # JSON
pd.read_parquet('file.parquet') # Parquet
pd.read_sql('SELECT * FROM t', conn) # SQL

Quick Inspect
df.head(3); df.tail(3)
df.shape; df.columns; df.index; df.dtypes
df.info() # schema
df.describe(include='all') # stats

Select Columns/Rows
df['A'] # Series
df[['A','B']] # subset columns
df.loc['row_label', 'A'] # label-based
df.iloc[0:5, 0:2] # position-based
df.at['row_label','A'] # scalar fast (label)
df.iat[0,0] # scalar fast (pos)

Filter Rows
df[df['A'] > 2]
df[(df['A'] > 1) & (df['B'] < 6)]
df.query('A > 1 and B < 6') # expressive

Assign / Create Columns


df['C'] = df['A'] + df['B']
df.assign(D=lambda x: x.A * 2)
df.rename(columns={'A':'alpha'}, index=str.upper)
df.rename_axis('row_id').reset_index()

Missing Data
df.isna(); df.notna()
df.fillna(0); df.fillna(method='ffill')
df.dropna() # drop rows
df.dropna(axis=1) # drop cols
df.interpolate() # numeric

Sorting
df.sort_values(by=['A','B'], ascending=[True, False])
df.sort_index()

Groupby / Aggregation
g = df.groupby('A') # groupby key
g['B'].agg(['mean','max','count'])
df.groupby('key').agg(total=('val','sum'), avg=('val','mean'))
df.groupby(['k1','k2'], dropna=False).size()

Reshape: Pivot / Melt


pd.pivot_table(df, index='id', columns='key', values='val', aggfunc='sum')
df.pivot(index='id', columns='key', values='val')
pd.melt(df, id_vars=['id'], var_name='metric', value_name='value')

Merge / Join / Concat


pd.merge(df1, df2, on='id', how='inner') # left/right/outer
pd.merge(df1, df2, left_on='a', right_on='b')
pd.concat([df1, df2], axis=0, ignore_index=True) # stack rows
pd.concat([df1, df2], axis=1) # side-by-side

Dates & Times


df['t'] = pd.to_datetime(df['t']) # parse
df.set_index('t').loc['2025-10'] # slice by month
df['t'].dt.year; df['t'].dt.month; df['t'].dt.day_name()
df.resample('D').sum() # time-based group
pd.date_range('2025-01-01', periods=7, freq='D')

String Ops (vectorized)


s.str.lower(); s.str.upper(); s.str.len()
s.str.contains('pat'); s.str.replace('x','y', regex=True)
s.str.extract(r'(\d+)'); s.str.split('-', expand=True)

Categoricals
df['cat'] = pd.Categorical(df['cat'], categories=['low','med','high'], ordered=True)
df['cat'].cat.reorder_categories(['low','med','high'])

Apply / Map
df['A'].map({1:'one',2:'two'})
df['B'].apply(lambda x: x**2) # Series-wise
df.apply(np.sum, axis=0) # column-wise
df.apply(np.sum, axis=1) # row-wise

Descriptive Stats
df.mean(numeric_only=True); df.std()
df['A'].quantile([0.25, 0.5, 0.75])
df.corr(numeric_only=True); df.cov()
df.value_counts(dropna=False)

Window Ops (Rolling/Expanding)


df['A'].rolling(window=3).mean()
df['A'].expanding(min_periods=1).sum()
df['A'].ewm(alpha=0.3).mean() # EWMA

Indexing Utilities
df.set_index('id', inplace=False)
df.reset_index(drop=True)
df.reindex(range(10)).reindex(columns=['A','B','C'])
df.swaplevel(0,1, axis=0) # MultiIndex
df.sort_index(level=[0,1])

Plotting (quick)
# Matplotlib must be installed; Pandas uses it under the hood
df.plot() # line by default
df['A'].plot(kind='hist') # histogram
df.plot(kind='scatter', x='A', y='B')

Export / Save
df.to_csv('out.csv', index=False)
df.to_excel('out.xlsx', index=False)
df.to_parquet('out.parquet')
df.to_json('out.json', orient='records', lines=True)
df.to_sql('table', conn, if_exists='replace', index=False)

Performance Tips
# Prefer vectorization over Python loops
# Use categoricals for low-cardinality strings
# For large CSVs: use dtype=, usecols=, chunksize=
# Consider Parquet for fast IO and types
# Use pd.options.mode.copy_on_write = True # (pandas 2.0+)

You might also like