Pandas Cheat Sheet

The document provides a comprehensive guide on tidy data principles and data wrangling techniques using pandas in Python. It covers various operations such as reshaping data, subsetting observations and variables, handling missing data, and performing group operations. Additionally, it includes plotting functions and type conversion methods to enhance data analysis and visualization.

Uploaded by

anas jan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views3 pages

Pandas Cheat Sheet

Uploaded by

anas jan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Tidy Data – A foundation for wrangling in pandas

Data Wrangling
& *
Tidy data complements pandas’s vectorized
with pandas Cheat Sheet In a tidy operations. pandas will automatically preserve
[Link] data set: observations as you manipulate variables. No
other format works as intuitively with pandas.
M A
Pandas API Reference Pandas User Guide Each variable is saved
in its own column
Each observation is
saved in its own row *
Creating DataFrames Reshaping Data – Change layout, sorting, reindexing, renaming
a b c df.sort_values('mpg')
1 4 7 10 Order rows by values of a column (low to high).
2 5 8 11
df.sort_values('mpg', ascending=False)
3 6 9 12
Order rows by values of a column (high to low).
df = [Link](
{"a" : [4, 5, 6], [Link](df) [Link](columns = {'y':'year'})
[Link](columns='var', values='val')
Gather columns into rows. Rename the columns of a DataFrame
"b" : [7, 8, 9], Spread rows into columns.
"c" : [10, 11, 12]}, df.sort_index()
index = [1, 2, 3]) Sort the index of a DataFrame
Specify values for each column.
df.reset_index()
df = [Link]( Reset index of DataFrame to row numbers, moving
[[4, 7, 10], index to columns.
[5, 8, 11], [Link]([df1,df2], axis=1)
[Link]([df1,df2]) [Link](columns=['Length', 'Height'])
[6, 9, 12]], Append columns of DataFrames
Append rows of DataFrames Drop columns from DataFrame
index=[1, 2, 3],
columns=['a', 'b', 'c'])
Specify values for each row. Subset Observations - rows Subset Variables - columns Subsets - rows and columns
a b c Use [Link][] and [Link][] to select only
N v rows, only columns or both.
1 4 7 10 Use [Link][] and [Link][] to access a single
D df[[Link] > 7]
2 5 8 11 df[['width', 'length', value by row and column.
e 2 6 9 12
Extract rows that meet logical criteria. 'species']] First index selects rows, second index columns.
df.drop_duplicates() Select multiple columns with specific names.
[Link][10:20]
df = [Link]( Remove duplicate rows (only considers columns). df['width'] or [Link]
Select rows 10-20.
{"a" : [4 ,5, 6], [Link](frac=0.5) Select single column with specific name.
[Link][:, [1, 2, 5]]
"b" : [7, 8, 9], Randomly select fraction of rows. [Link](regex='regex')
Select columns in positions 1, 2 and 5 (first
"c" : [10, 11, 12]}, [Link](n=10) Randomly select n rows. Select columns whose name matches
column is 0).
index = [Link].from_tuples( [Link](n, 'value') regular expression regex.
[('d', 1), ('d', 2), Select and order top n entries. Using query [Link][:, 'x2':'x4']
Select all columns between x2 and x4 (inclusive).
('e', 2)], names=['n', [Link](n, 'value')
query() allows Boolean expressions for filtering [Link][df['a'] > 10, ['a', 'c']]
'v'])) Select and order bottom n entries.
rows. Select rows meeting logical condition, and only
Create DataFrame with a MultiIndex [Link](n)
[Link]('Length > 7') the specific columns .
Select first n rows.
Method Chaining [Link](n)
Select last n rows.
[Link]('Length > 7 and Width <
8')
[Link][1, 2] Access single value by index
[Link][4, 'A'] Access single value by label
Most pandas methods return a DataFrame so that [Link]
another pandas method can be applied to the Logic in Python (and pandas) ('[Link]("abc")', regex (Regular Expressions) Examples
result. Less than Not equal to
engine="python") Matches strings containing a period '.'
< != '\.'
This improves readability of code.
> Greater than [Link](values) Group membership 'Length$' Matches strings ending with word 'Length'
df = ([Link](df)
.rename(columns={ == Equals [Link](obj) Is NaN '^Sepal' Matches strings beginning with the word 'Sepal'

'variable':'var', <= Less than or equals [Link](obj) Is not NaN '^x[1-5]$' Matches strings beginning with 'x' and ending with 1,2,3,4,5
'value':'val'}) >= Greater than or equals &,|,~,^,[Link](),[Link]( Logical and, or, not, xor, any, all '^(?!Species$).*' Matches strings except the string 'Species'
.query('val >= 200') )
Cheatsheet for pandas ([Link] originally written by Irv Lustig, Princeton Consultants, inspired by Rstudio Data Wrangling Cheatsheet
Group Data Combine Data Sets
[Link](by="col") The examples below can also be applied to groups. In this case, the
function is applied on a per-group basis, and the returned vectors adf bdf
Return a GroupBy object, grouped x1 x2 x1 x3
by values in column named "col". are of the length of the original DataFrame.
A 1 A T
shift(1) shift(-1) B 2 B F
[Link](level="ind") Copy with values shifted by 1. Copy with values lagged by 1. C 3 D T
Return a GroupBy object, grouped rank(method='dense') cumsum()
by values in index level named Ranks with no gaps. Standard Joins
Cumulative sum.
"ind". rank(method='min') cummax() x1 x2 x3 [Link](adf, bdf,
Ranks. Ties get min rank. Cumulative max. A 1 T how='left', on='x1')
All of the summary functions listed above can be applied to a group. rank(pct=True) cummin() B 2 F Join matching rows from bdf to adf.
Additional GroupBy functions: Ranks rescaled to interval [0, 1]. Cumulative min. C 3 NaN
size() agg(function) rank(method='first') cumprod()
Size of each group. Aggregate group using function. Ranks. Ties go to first value. Cumulative product. x1 x2 x3 [Link](adf, bdf,
A 1.0 T how='right', on='x1')
Summarize Data Handling Missing Data
B 2.0
D NaN
F
T
Join matching rows from adf to bdf.

df['w'].value_counts() [Link]() x1 x2 x3
Count number of rows with each unique value of variable Drop rows with any column having NA/null data. [Link](adf, bdf,
A 1 T
len(df) [Link](value) how='inner', on='x1')
B 2 F
# of rows in DataFrame. Replace all NA/null data with value. Join data. Retain only rows in both sets.
[Link]
x1 x2 x3 [Link](adf, bdf,
Tuple of # of rows, # of columns in DataFrame.
df['w'].nunique() Make New Columns A 1
B 2
T
F
how='outer', on='x1')
# of distinct values in a column. Join data. Retain all values, all rows.
[Link]() C 3 NaN
Basic descriptive and statistics for each column (or GroupBy). D NaN T
[Link]() Filtering Joins
Prints a concise summary of the DataFrame. [Link](Area=lambda df: x1 x2 adf[[Link](bdf.x1)]
df.memory_usage() [Link]*[Link]) A 1 All rows in adf that have a match in bdf.
Prints the memory usage of each column in the DataFrame. Compute and append one or more new columns. B 2
[Link]() df['Volume'] = [Link]*[Link]*[Link]
Prints a Series with the dtype of each column in the DataFrame. Add single column. x1 x2 adf[~[Link](bdf.x1)]
[Link]([Link], n, labels=False) C 3 All rows in adf that do not have a match in bdf.
Bin column into n buckets.
Vector Vector ydf zdf
function function x1 x2 x1 x2
A 1 B 2
pandas provides a large set of summary functions that operate on
B 2 C 3
different kinds of pandas objects (DataFrame columns, Series, pandas provides a large set of vector functions that operate on all
C 3 D 4
GroupBy, Expanding and Rolling (see below)) and produce single columns of a DataFrame or a single selected column (a pandas
values for each of the groups. When applied to a DataFrame, the Series). These functions produce vectors of values for each of the Set-like Operations
result is returned as a pandas Series for each column. Examples: columns, or a single Series for the individual Series. Examples: x1 x2 [Link](ydf, zdf)
sum() min() max(axis=1) min(axis=1) B 2 Rows that appear in both ydf and zdf
Sum values of each object. Minimum value in each object. Element-wise max. Element-wise min. C 3 (Intersection).
count() max() clip(lower=- abs()
Count non-NA/null values of Maximum value in each object. 10,upper=10) Absolute value. x1 x2
[Link](ydf, zdf, how='outer')
each object. mean() Trim values at input thresholds A 1
Rows that appear in either or both ydf and zdf
median()
Median value of each object.
Mean value of each object.
var() Windows B
C
2
3
(Union).
quantile([0.25,0.75]) Variance of each object. [Link]() D 4
[Link](ydf, zdf, how='outer',
Quantiles of each object. std() Return an Expanding object allowing summary functions to be
x1 x2 indicator=True)
apply(function) Standard deviation of each applied cumulatively.
A 1 .query('_merge == "left_only"')
Apply function to each object. object. [Link](n)
.drop(columns=['_merge'])
Return a Rolling object allowing summary functions to be
Rows that appear in ydf but not zdf (Setdiff).
applied to windows of length n.
Cheatsheet for pandas ([Link] originally written by Irv Lustig, Princeton Consultants, inspired by Rstudio Data Wrangling Cheatsheet
Plotting
[Link]() [Link](x='w', y='h') [Link]() [Link]()
Frequently Used
Plot a line graph for the DataFrame. Plot a scatter graph of the DataFrame. Plot a histogram of the DataFrame. Plot a pie chart of the DataFrame. Options
Pandas offers some ‘options’ to globally
control how Pandas behaves, display etc.
Options can be queried and set via:
[Link]() [Link]() [Link]() [Link]() [Link].option_name (where
Plot a line graph for the DataFrame. Plot a scatter graph of the DataFrame. Plot an area graph of the DataFrame. Plot a hexbin graph of the DataFrame. option_name is the name of an option). For
example:
[Link].max_rows =
20
Set the display.max_rows option to
[Link](subplots=True) [Link](cumulative=True) [Link](stacked=True) Functions
20.
Separate into different graphs for each column in Creates a cumulative plot Stacks the data for the columns on top of each get_option(option)
the DataFrame. [Link](bins=30) other. (bar, barh and area only) Fetch the value of the given option.
[Link](title=“Graph of A against Set the number of bins into which data is grouped [Link](alpha=0.5) set_option(option)
B”) (histograms) Sets the transparency of the plot to 50%. Set the value of the given option.
Sets the title of the graph. reset_option(options)
[Link](subplots=True, title=['col1', 'col2', 'col3'])
Reset the values of all given options to
Arguments can be combined for more flexibility when graphing, this would plot a separate line graph for of column of a 3-columned DataFrame. The first string in the
default settings.
list of titles applies to the graph of the left-most column.
describe_option(options)
Print descriptions of given options.
Changing Type Series String Operations option_context(options)
Execute code with temporary option
Similar to python string operations, except these are vectorized to apply to the settings that revert to prior settings after
pd.to_numeric(data) [Link](type)
entire Series efficiently. execution.
Convert non-numeric types to Convert data to (almost) any given [Link](pattern) [Link]()
numeric. type including categorical Returns a series with the integer Concatenate elements into a single Display options
pd.to_datetime(data) df.infer_objects() counts in each element. string display.max_rows
Convert non-datetime types to Attempts to infer a better type for [Link](index) [Link](sep) The maximum number of rows displayed
datetime type object type data. Returns a series with the data at the Splits the string on the first instance in pretty-print.
pd.to_timedelta(data) df.convert_dtypes() given index for each element. of the separator display.max_columns
Convert non- timedelta types to Convert columns to best possible [Link](sep) [Link](start, stop, The maximum number of columns
timedelta dtypes Returns a series where each element step) displayed in pretty-print.
has been concatenated. Slices each string display.expand_frame_repr

Datetime [Link]
Extract the day (int) from the date.
[Link]()
Converts the first character of each
[Link](pat, rep)
Use regex to replace patterns in each
Controls whether the DataFrame
representation stretches across pages.
With a Series containing data of word to be a capital. string. display.large_repr
[Link] Controls whether a DataFrame that
type datetime, the dt accessor [Link]() [Link]()
Find which quarter the date lies in. exceeds maximum rows/columns is
is used to get various Returns a series with the lengths of Checks whether each element is
components of the datetime [Link] truncated or summarized
each element. alpha-numeric
values: Extract the hour. [Link]
[Link]
Input/Output
[Link] The output display precision in decimal
Extract the year Extract the minute.
[Link] places.
[Link] display.max_colwidth
Extract the month as an integer. Common file types for data input include CSV, JSON, HTML which are human-
Extract the second. The maximum width of columns, longer
readable, while the common output types are usually more optimized for
performance and scalability such as feather, parquet and HDF. cells will be truncated.

Mapping df = pd.read_csv(filepath)
Read data from csv file
df.to_parquet(filepath)
Write data to parquet file
display.max_info_columns
The maximum number of columns
Apply a mapping to every element in a DataFrame or Series, useful for df = pd.read_html(filepath) df.to_feather(filepath) displayed after calling info().
recategorizing or transforming data. Read data from html file Write data to feather file display.chop_threshold
df = pd.read_excel df.to_hdf(filepath) Sets the rounding threshold to zero when
[Link](lambda x: 2*x) displaying a Series/DataFrame.
(filepath) Write data to HDF file
Returns a copy of the series where every entry is doubled display.colheader_justify
Read data from xls (and related) files df.to_clipboard()
[Link](lambda s: [Link]() - [Link](), axis=1) Controls how column headers are justified.
df = pd.read_sql(filepath) Copy object to the system clipboard
Returns a Series with the difference of the maximum and minimum values of
Read data from sql file
each row of the DataFrame
pd.read_clipboard()

Pandas - Cheat - Sheet FULL
No ratings yet
Pandas - Cheat - Sheet FULL
2 pages
Hibhi
No ratings yet
Hibhi
3 pages
Pandas Data Wrangling Cheat Sheet
100% (1)
Pandas Data Wrangling Cheat Sheet
2 pages
Pandas Data Wrangling Cheat Sheet
No ratings yet
Pandas Data Wrangling Cheat Sheet
2 pages
Pandas Cheat Sheet
100% (1)
Pandas Cheat Sheet
2 pages
Pandas Data Wrangling Cheat Sheet
100% (2)
Pandas Data Wrangling Cheat Sheet
6 pages
Pandas DataFrame Cheat Sheet
No ratings yet
Pandas DataFrame Cheat Sheet
6 pages
100 Pandas Puzzles
No ratings yet
100 Pandas Puzzles
20 pages
Pandas Cheat Sheet
85% (13)
Pandas Cheat Sheet
2 pages
Pandas Cheat Sheet CN
No ratings yet
Pandas Cheat Sheet CN
4 pages
Pandas Cheat Sheet
100% (4)
Pandas Cheat Sheet
2 pages
Data Ingestion and Reshaping Guide
100% (1)
Data Ingestion and Reshaping Guide
2 pages
Pandas Cheat Sheet for Data Science
No ratings yet
Pandas Cheat Sheet for Data Science
5 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
2 pages
Pandas
No ratings yet
Pandas
13 pages
Python Data Structures and Libraries Guide
No ratings yet
Python Data Structures and Libraries Guide
7 pages
Learn Data Analysis With Pandas - Introduction
No ratings yet
Learn Data Analysis With Pandas - Introduction
2 pages
Pandas DataFrame Basics Guide
No ratings yet
Pandas DataFrame Basics Guide
4 pages
Pandas Notes
No ratings yet
Pandas Notes
20 pages
Python Pandas Data Structuring Guide
No ratings yet
Python Pandas Data Structuring Guide
5 pages
Fundamental - Python
No ratings yet
Fundamental - Python
3 pages
Pandas DataFrame Basics Guide
No ratings yet
Pandas DataFrame Basics Guide
9 pages
Data Science Cheat Sheet: KEY Imports
100% (1)
Data Science Cheat Sheet: KEY Imports
1 page
Pandas
No ratings yet
Pandas
27 pages
Pandas Cheat Sheet PDF
67% (3)
Pandas Cheat Sheet PDF
1 page
Python Cheat Sheet Code Academy
100% (1)
Python Cheat Sheet Code Academy
1 page
Chapter 2 Python Pandas - II
No ratings yet
Chapter 2 Python Pandas - II
19 pages
Data Wrangling With Python and Pandas
No ratings yet
Data Wrangling With Python and Pandas
7 pages
Content Pandas Cheat Sheet
No ratings yet
Content Pandas Cheat Sheet
9 pages
Python Interviews
No ratings yet
Python Interviews
154 pages
Cookbook RST
No ratings yet
Cookbook RST
28 pages
Pandas
No ratings yet
Pandas
13 pages
Pandas Quick Start Guide
No ratings yet
Pandas Quick Start Guide
23 pages
Pandas Guide for Beginners
No ratings yet
Pandas Guide for Beginners
18 pages
Pandas Merged
No ratings yet
Pandas Merged
2 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
13 pages
Pandas Data Wrangling Cheatsheet Datacamp PDF
No ratings yet
Pandas Data Wrangling Cheatsheet Datacamp PDF
1 page
Exercise 7 - Pandas
No ratings yet
Exercise 7 - Pandas
2 pages
Pandas
No ratings yet
Pandas
25 pages
Pandas
No ratings yet
Pandas
24 pages
Python 2.1.3
No ratings yet
Python 2.1.3
6 pages
Top Machine Learning Artificial Intelligence AI Data Science Cheat Sheets ForML & Deep Learning Engineers
No ratings yet
Top Machine Learning Artificial Intelligence AI Data Science Cheat Sheets ForML & Deep Learning Engineers
14 pages
DAP 3 Module
No ratings yet
DAP 3 Module
62 pages
Introduction To Seaborn
No ratings yet
Introduction To Seaborn
11 pages
Zahid
No ratings yet
Zahid
11 pages
Talal
No ratings yet
Talal
8 pages
Assignment 01
No ratings yet
Assignment 01
1 page
Transformer Protection Strategies
No ratings yet
Transformer Protection Strategies
10 pages
Application For Occupancy Permit
No ratings yet
Application For Occupancy Permit
2 pages
Etextbook Laboratory and Diagnostic Testing in Ambulatory Care A Guide For Health Care Professionals - 3rd Edition Updated Edition Download
No ratings yet
Etextbook Laboratory and Diagnostic Testing in Ambulatory Care A Guide For Health Care Professionals - 3rd Edition Updated Edition Download
16 pages
Share Zimbabwejobs MONDAY.,, 6 1
No ratings yet
Share Zimbabwejobs MONDAY.,, 6 1
72 pages
Chapter 01
No ratings yet
Chapter 01
14 pages
Abu Dhabi Traffic Control Manual
No ratings yet
Abu Dhabi Traffic Control Manual
312 pages
hs486 Document OSHA Recording Work-Related Injuries and Illnesses
No ratings yet
hs486 Document OSHA Recording Work-Related Injuries and Illnesses
13 pages
Comprehensive SEO Strategy Checklist
No ratings yet
Comprehensive SEO Strategy Checklist
3 pages
Change of Basis in Vector Spaces
No ratings yet
Change of Basis in Vector Spaces
5 pages
Hazards in Agricultural Maintenance Tasks
No ratings yet
Hazards in Agricultural Maintenance Tasks
2 pages
PW 150 Sym Exam1
No ratings yet
PW 150 Sym Exam1
6 pages
Partnership Agreement F&B
No ratings yet
Partnership Agreement F&B
6 pages
Millwardbrown Perspectives 2008-2009
No ratings yet
Millwardbrown Perspectives 2008-2009
221 pages
Interoperability Between UMTS and LTE (RAN19.0 - 01) PDF
No ratings yet
Interoperability Between UMTS and LTE (RAN19.0 - 01) PDF
1,393 pages
Efford Bill Analysis - A Response
No ratings yet
Efford Bill Analysis - A Response
10 pages
Micheal Tesfaye
No ratings yet
Micheal Tesfaye
95 pages
PGX Plus 40b
No ratings yet
PGX Plus 40b
40 pages
Envicool Cabinet Air Conditioner Service Manual
100% (3)
Envicool Cabinet Air Conditioner Service Manual
22 pages
Improving Automated Analysis of Windows x64 Binaries: April, 2006 Skape
No ratings yet
Improving Automated Analysis of Windows x64 Binaries: April, 2006 Skape
19 pages
Acoustic Absorption for Designers
No ratings yet
Acoustic Absorption for Designers
6 pages
Travelling Vocabulary
No ratings yet
Travelling Vocabulary
5 pages
4WI 100-800 HP Boiler Specifications
No ratings yet
4WI 100-800 HP Boiler Specifications
4 pages
Hyrbid-AI - Download Instructions
No ratings yet
Hyrbid-AI - Download Instructions
15 pages
ASHRAE HVAC Design Manual For Hospitals and Clinics
100% (5)
ASHRAE HVAC Design Manual For Hospitals and Clinics
297 pages
2025 CDM Curriculum Brief DMAP v7
No ratings yet
2025 CDM Curriculum Brief DMAP v7
11 pages
Solved Currently The Price Index Used To Calculate The Inflation Rate
No ratings yet
Solved Currently The Price Index Used To Calculate The Inflation Rate
1 page
Mattelsa's Organizational Insights
No ratings yet
Mattelsa's Organizational Insights
23 pages
BA 177 Reflection Paper Guidelines
No ratings yet
BA 177 Reflection Paper Guidelines
9 pages
Import Cost Calculation Guide
No ratings yet
Import Cost Calculation Guide
2 pages
Rail Transit in the Six Feet Society
No ratings yet
Rail Transit in the Six Feet Society
20 pages

Pandas Cheat Sheet

Uploaded by

Pandas Cheat Sheet

Uploaded by

Tidy Data – A foundation for wrangling in pandas

You might also like