0% found this document useful (0 votes)

56 views6 pages

Pandas Data Manipulation Cheat Sheet

The document provides an overview of key Pandas concepts for working with Series and DataFrames. It describes how Series are 1-dimensional labeled arrays that can have names, indexes, and different data types. DataFrames are 2-dimensional tables that contain rows and columns, with each column being a Series. The document outlines many common Series and DataFrame methods for data manipulation, such as filtering, sorting, descriptive statistics, and transformations.

Uploaded by

Soumabrata Sarkar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

56 views6 pages

Pandas Data Manipulation Cheat Sheet

Uploaded by

Soumabrata Sarkar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

Pandas: Python Package | Fast, flexible and powerful data manipulation tool

Series: 1 Dimensional labelled array of any data type

>Bros = pd.Series(data = [‘Soum’ , ’Ayush’], index = [1,2], dtype = ’string’) # Using List for creating Series
Note: When Index are not specified Pandas assign a sequence of integers starting from zero. This is a built-in Pandas
object called ‘RangeIndex’

Method & Attributes:

Method – A function bound to the object (Example: List1.equals(List2)
Attribute – A variable bound to the object (Example: List1.dtype, Bros.name = ‘Cool Dudes!’)
Note: Series can have names and those names can be used as column names in DataFrame. Similarly, the Series Index
can also have its own name

Add_prefix() & Add_suffix() Method: (Doesn’t modify the original Series labels)
>Alphabet.add_prefix(‘_prefix_label’)
>Alphabet.add_suffix(‘_suffix_label’)

Boolean mask and the .loc Indexer:

Boolean Mask - Used to index select items at scale | work with [] and .loc | need to be of same length as Series
>Alphabet.loc[[True if i%2==0 else False for i in range(26)]]

.loc – The prototypical way of doing label based extraction

.iloc – Way of doing integer / location based extraction
Example:
>Alphabet = pd.Series(data = [‘A’,’B’,’C’,’D’], index = [1,2,3,4], dtype=’string’)
>Alphabet.loc[1] ---> Output: A
>Alphabet.iloc[1] ---> Output: B

Selecting with .get():

-‘get()’ works for both ‘index based on labels’ as well as ‘location based extraction’
>Alphabet.get(‘label_A’) ---> Output: A
>Alphabet.get(0) ---> Output: A
>Alphabet.get(‘Anything’, default=’Couldn’t find anything!’) ---> Output: Couldn’t find anything!
Note: default of .get() is set to ‘None’

Series Methods and Handling:

pd.read_csv – Read a comma-separated values (csv) file into the DataFrame
>Alcohol = pd.read_csv(‘https://andybek.com/drinks’, usecols = [‘country’,’wine_servings’], index_col = [‘country’])
>type(Alcohol) ---> pandas.core.frame.DataFrame

If the parsed data only contains one column then setting squeeze parameter as True will return a Series
>Alcohol = pd.read_csv(‘https://andybek.com/drinks’, usecols = [‘country’,’wine_servings’], index_col = [‘country’],
squeeze = True)
>type(Alcohol) ---> pandas.core.series.Series
Size & Shape:
.size - number if elements in the Series
.shape - tuple of the dimension for a Series

Unique values and Series Monotonicity:

.is_unique - Simply checks if the Series contains a sequence of Unique values (True/Values)
.nunique() - Gives exact counts of unique values in a Series (excluding NA) (#)
.nunique(dropna = False) - Gives counts of unique values in a Series (including NA)
.unique() - Gives a list of unique values in the Series
.is_monotonic / .is_monotonic_increasing - Checks if Series is increasing / stagnation
.is_monotonic_decreasing - Checks if Series is decreasing

The count() Method:

Series.count() - Return number of non-NA / Null observations in the Series
.isna().sum() - Return uber of Null elements in the Series

Dropping and Filling NAs:

>Alcohol.fillna(100, inplace = False)
>Alcohol.dropna(inplace = False) - Exclude NAs from the Series
Note: Both methods return a copy of the Series unless the ‘inplace’ parameter is set to True

Descriptive Statistics:
>Alcohol.sum() - Excludes NA’s
>Alcohol.mean()
>Alcohol.median()
>Alcohol.quantile(q=0.5)
IQR (Interquantile Range) -> Alcohol.quantile(0.75) – Alcohol.qunatile(0.25)
>Alcohol.min()
>Alcohol.max()
>Alcohol.std()
>Alcohol.var()
Note: Alcohol.std()**2 = Alcohol.var() | Mode - Item with highest frequency

Describe() Method:
Gives an overall statistical description of the dataset
>Alcohol.describe(percentile = [0.79,0.19], include = float, exclude = object)

Value_counts():
A sorted series containing unique values and their counts
>Alcohol.value_counts(sort = True, ascending = False, dropna = True, normalize = False)
Note: Normalize provides relative frequency

idxmax() & idxmin():

idxmax() - Returns the label of the row with minimum value
idxmin() - Returns the label of the row with minimum value
Note: If ‘multiple’ min/max values are present then only the first label is returned
nlargest() & nsmallest():
>Alcohol.nlargest(n = 10) - Shows largest 10 alcohol consuming countries / values
>Alcohol.nsmallest(n = 10) - Shows smallest 10 alcohol consuming countries / values

Sorting with sort_values():

Returns a new Series, sorted by values
>Alcohol.sort_values(ascending = False, na_position = ‘last’, kind = ‘quicksort’, inplace = False)
Note: Have other sorting options such as ‘mergesort’ or ‘heapsort’, default is ‘quicksort’

Sorting wit sort_index():

Returns a new Series, sorted by index labels
>Alcochol.sort_index(ascending = True, na_postion = ‘last’, inplace = False)
Note: NaN won’t show up in the sorting if not present in the index label

Series Arithmetic and fill_value():

Use to add / subtract / divide / multiply two Series
>Alcohol.add(More_Drinks, fill_value = 0)
>Alcohol.subtract(More_Drinks, fill_value = 0)
>Alcohol.divide(More_Drinks, fill_value = 1)
>Alcohol.multiply(More_Drinks, fill_value = 1)
Note: ‘fill_value’ is set to 0 / 1 so that we don’t end up losing data due to Series misalignment

Calculating Variance and Standard Deviation:

Variance - The average of squared differences from the mean

>(Alcohol.subtract(Alcohol.mean())**2).sum()/(Alcohol.count() -1)
Note: Standard Deviation is square of Variance

Cumulative Operations:
>Alcohol.cumsum(skipna =True) - Calculate a progressive / cumulative sum (of the values preceding in the Series)
>Alcohol.cumprod()
>Alcohol.cummin()
>Alcohol.cummax()
Note: ‘NaN’ are skipped (i.e., ‘skipna’ parameter is set to ‘True’ by default) as sum of any number with ‘NaN’ is ‘NaN’)

Pairwise Difference with Diff():

Used to calculate the discrete difference for pair of elements in a Series
>Ser.diff(periods = 1)
Series Iteration:
>for i in Alcohol:
print(i)
-Prints value / alcohol consumption without labels

>for i in Alcohol.index:
print(i)
-Prints labels without values

>for i in Alcohol.index:
print(i, Alcohol[i])
-Prints both label and value

>for i in Alcohol.items(): / for i in Alcohol.iteritems():

print(i)
-Returns a lazy evaluated iterable of tuple | Behind the scene used ‘zip’ method

Filtering: filter(), where() and mask()

>Alcohol.filter(regex = ‘^V’) - Will filter countries starting with letter ‘V’
>Alcohol.filter(like = ‘stan’) - Will filter countries having ‘stan’ in them
Note: Filtering is done along index labels and not on values on using ‘filter()’ function. For filtering for values, use square
brackets (Example: Alcohol[Alcohol > 200])

pandas.Series.where() - Replace values with ‘NaN / set value’ where the condition is False
>Alcohol.where(lambda x: x > 200, np.nan).dropna()

pandas.Series.mask() - Replace values with ‘NaN / set value’ where the condition is True
>Alcohol.mask(lambda x: x > 200, np.nan).dropna()

Note: Default replacing value is ‘NaN’

Transforming with update(), apply() & map():

update() - Update series with newly added values
>Alcohol.update(pd.Series(data = [200,20], index = [‘Albania’, ‘Algeria’]))

apply() - Applies a transformation to each and every element in a series

>Alcohol.apply(lambda x : x**2) | Alcohol.apply(np.square)

map() - Map values of Series according to input correspondence | Used for substituting each value in a Series with
another value, that may be derived from a function, a dictionary Series
>Ser.map({‘old value’ : ‘new value’})
>Alcohol.map(lambda x : x**2)
DataFrame:
A table of data that contains a collection of rows and columns.
Key Aspects -
1. DataFrames have two dimensions: labeled indices and columns
2. Each column in a DataFrame is a Series and each column must be of same size
3. Unlike Series, DataFrames could be heterogenous (i.e., have multiple data types)

>names = [‘Olga’, ‘Andre’, ‘Brian’]

>ages = [29, 21, 45]
>married = [False, True, True]
>df = pd.DataFrame({‘name’ : names, ‘age’ : ages, ‘married’ : married})

Need to specify index position as well as column position / name to fetch specific value from DataFrame
> df.iloc[2,0] --> Output: ‘Brian’

.dtypes: Shows columns name and their datatypes

Note: ‘.dtype’ will throw an error as DataFrame can have multiple data types

.ndim & .shape:

.ndim - Direct way of comparing dimension of the dataset
.shape - Gives shape of the dataset

More ways for DataFrame:

1. Dict of Tuples (Column-wise)
>tuple_name = tuple(names)
>tuple_ages = tuple(ages)
>tuple_married = tuple(married)
>pd.DataFrame({‘name’ : tuple_name, ‘age’ : tuple_ages, ‘married’ : tuple_married})

2. Dict of Dicts (Column-wise)

>series_name = pd.Series(names)
>series_ages = pd.Series(ages)
>series_married = pd.Series(married)
>pd.DataFrame({‘name’ : series_name, ‘age’ : series_ages, ‘married’ : series_married})

3. Dict of Series (Column-wise)

>dict_names = {k:v for k,v in enumerate(names)}
>dict_ages = {k:v for k,v in enumerate(ages)}
>dict_married = {k:v for k,v in enumerate(married)}
>pd.DataFrame({‘name’ : dict_name, ‘age’ : dict_ages, ‘married’ : dict_married})

4. List of Dicts (Row-wise)

>rowwise = [{‘name’ : name, ‘age’ : ages, ‘married’ : married} for name, ages, married in zip(names, ages, married)]

The info() Method:

Gives overview about the dataset. Only works for DataFrames
>df.info(verbose = True, max_cols = 4, memory_usage = ‘deep)
Note: verbose default is set to True. Setting it to False will remove column specific information
DataFrame:
>Nutrition = pd.DataFrame(‘http://andybek.com/pandas-nutrition’, index_col = [0])

.drop():
Removes specific rows / columns from the DataFrame
>Nutrition.drop(‘Unnamed : 0’, axis = 1)

.set_index():
Set specified column as index of the DataFrame
>Nutrition.set_index(‘Unnamed : 0’)

Cheat Python
No ratings yet
Cheat Python
8 pages
Pandas Python For Data Science
100% (1)
Pandas Python For Data Science
1 page
Pandas Cheat Sheet for Data Science
No ratings yet
Pandas Cheat Sheet for Data Science
1 page
Pandaspythonfordatascience
No ratings yet
Pandaspythonfordatascience
1 page
Pandas Cheat Sheet for Data Science
No ratings yet
Pandas Cheat Sheet for Data Science
1 page
Python For Data Science 1662157639
No ratings yet
Python For Data Science 1662157639
6 pages
Pandas Basics Cheat Sheet Guide
No ratings yet
Pandas Basics Cheat Sheet Guide
1 page
Data Science Notes Unit-1 Part - 2
No ratings yet
Data Science Notes Unit-1 Part - 2
22 pages
Mastering Pandas: A Comprehensive Guide
No ratings yet
Mastering Pandas: A Comprehensive Guide
13 pages
Pandas
No ratings yet
Pandas
5 pages
Pandas: DataFrames & Series Guide
No ratings yet
Pandas: DataFrames & Series Guide
2 pages
Pandas
No ratings yet
Pandas
36 pages
PandasGUIA PYTHON-04
No ratings yet
PandasGUIA PYTHON-04
1 page
Pandas
No ratings yet
Pandas
21 pages
Lecture 3 - Pandas
No ratings yet
Lecture 3 - Pandas
37 pages
Unit 2
No ratings yet
Unit 2
81 pages
Subject IP
No ratings yet
Subject IP
9 pages
05getting Started With Pandas
No ratings yet
05getting Started With Pandas
44 pages
Pandas
No ratings yet
Pandas
63 pages
Pandas Guide for Data Analysts
No ratings yet
Pandas Guide for Data Analysts
9 pages
Pandas - Cheat - Sheet (1) - 240511 - 113437
No ratings yet
Pandas - Cheat - Sheet (1) - 240511 - 113437
1 page
Pandas Basics Cheat Sheet Guide
No ratings yet
Pandas Basics Cheat Sheet Guide
1 page
WEBINTEL GUIDED LAB ACTIVITY Introduction To Pandas
No ratings yet
WEBINTEL GUIDED LAB ACTIVITY Introduction To Pandas
1 page
Series and Pandas Methods
No ratings yet
Series and Pandas Methods
5 pages
Pandas
No ratings yet
Pandas
13 pages
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
100% (1)
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
12 pages
Ip Study
No ratings yet
Ip Study
18 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
13 pages
Unit 3
No ratings yet
Unit 3
10 pages
Pandas DataFrame Cheat Sheet
No ratings yet
Pandas DataFrame Cheat Sheet
4 pages
Pandas DataFrame Cheat Sheet
100% (1)
Pandas DataFrame Cheat Sheet
10 pages
Unit 4
No ratings yet
Unit 4
27 pages
Python Cheat Sheet For Excel Users
100% (2)
Python Cheat Sheet For Excel Users
5 pages
Pandas DataFrame Notes
100% (1)
Pandas DataFrame Notes
10 pages
Python Pandas Cheat Sheet Guide
No ratings yet
Python Pandas Cheat Sheet Guide
11 pages
Pandas: Import
100% (1)
Pandas: Import
13 pages
Python and SQL Programming Index
No ratings yet
Python and SQL Programming Index
29 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
10 pages
MLL Ip Xii
No ratings yet
MLL Ip Xii
22 pages
Pandas
No ratings yet
Pandas
7 pages
Pandas
No ratings yet
Pandas
20 pages
Session2-DM Using Pandas
No ratings yet
Session2-DM Using Pandas
51 pages
Introduction to Pandas Basics
No ratings yet
Introduction to Pandas Basics
6 pages
Pandas
No ratings yet
Pandas
8 pages
Data Handling Using Pandas-1
No ratings yet
Data Handling Using Pandas-1
60 pages
NumPy, SciPy, Pandas, Quandl Cheat Sheet
100% (3)
NumPy, SciPy, Pandas, Quandl Cheat Sheet
4 pages
Hyperplane Theorems in Economics
No ratings yet
Hyperplane Theorems in Economics
9 pages
Waves of Light: The Annual Magazine of St. Mary's Central School, Pottayil, Trivandrum (2016 - 2017)
100% (1)
Waves of Light: The Annual Magazine of St. Mary's Central School, Pottayil, Trivandrum (2016 - 2017)
35 pages
Grade 5-6 MTAP Reviewer Set 1-9 2019
No ratings yet
Grade 5-6 MTAP Reviewer Set 1-9 2019
18 pages
The Oxford Handbook of Meaning in Life Iddo Landau Available Full Chapters
No ratings yet
The Oxford Handbook of Meaning in Life Iddo Landau Available Full Chapters
79 pages
Mathematics 20042018 Quiz
No ratings yet
Mathematics 20042018 Quiz
10 pages
Trigonometric Problem Solving Guide
No ratings yet
Trigonometric Problem Solving Guide
6 pages
Autonomous Car Path Tracking Methods
No ratings yet
Autonomous Car Path Tracking Methods
78 pages
12 One Marks
No ratings yet
12 One Marks
15 pages
Fractions: Chapter Overview
No ratings yet
Fractions: Chapter Overview
18 pages
Scilab Manual For Digital Signal Processing by MR Vijay P Sompur Electronics Engineering Visvesvraya Technological University
100% (1)
Scilab Manual For Digital Signal Processing by MR Vijay P Sompur Electronics Engineering Visvesvraya Technological University
51 pages
10th Lesson Notes - Chapter3
No ratings yet
10th Lesson Notes - Chapter3
21 pages
Circular Measure Practice Questions
No ratings yet
Circular Measure Practice Questions
9 pages
Couette Flow Solution Using Analytic and Crank-Nicolson Numerical
No ratings yet
Couette Flow Solution Using Analytic and Crank-Nicolson Numerical
16 pages
Gilles Deleuze Key Concepts 2nd Edition Charles J. Stivale (Editor) Digital Version 2025
No ratings yet
Gilles Deleuze Key Concepts 2nd Edition Charles J. Stivale (Editor) Digital Version 2025
154 pages
MGF and Applications of Binomial Distribution
No ratings yet
MGF and Applications of Binomial Distribution
24 pages
Modern Diplomacy 4th Edition R. P. Barston Online Version
No ratings yet
Modern Diplomacy 4th Edition R. P. Barston Online Version
88 pages
Ga17.1 Pattern, Subsitution, Simplifying, From MYP1 Pearson
No ratings yet
Ga17.1 Pattern, Subsitution, Simplifying, From MYP1 Pearson
3 pages
Partial Differential Equations Assessment Questions - Sanfoundry
No ratings yet
Partial Differential Equations Assessment Questions - Sanfoundry
9 pages
File-6 Past Papers
No ratings yet
File-6 Past Papers
77 pages
11-Equations of Circles
No ratings yet
11-Equations of Circles
4 pages
Cbse-Ix-Mpcb: Index
No ratings yet
Cbse-Ix-Mpcb: Index
6 pages
Half Yearly Exam Portion STD - X 2024
No ratings yet
Half Yearly Exam Portion STD - X 2024
1 page
Transportation Trip
No ratings yet
Transportation Trip
182 pages
EELU DS Week1 L2
No ratings yet
EELU DS Week1 L2
23 pages
Grade IV Math Word Problem Study
No ratings yet
Grade IV Math Word Problem Study
2 pages
Maths SQP 1
No ratings yet
Maths SQP 1
7 pages
Updated Sample Paper Foundation Semester 2020
No ratings yet
Updated Sample Paper Foundation Semester 2020
6 pages
Math Log Review
No ratings yet
Math Log Review
20 pages
Weekly Learning Plan Math 8 Week 1
No ratings yet
Weekly Learning Plan Math 8 Week 1
2 pages
02 - Analysis Reference - Chapter-2 Node DOF CoordinateSystem
No ratings yet
02 - Analysis Reference - Chapter-2 Node DOF CoordinateSystem
5 pages

Pandas Data Manipulation Cheat Sheet

Uploaded by

Pandas Data Manipulation Cheat Sheet

Uploaded by

Pandas: Python Package | Fast, flexible and powerful data manipulation tool

Series: 1 Dimensional labelled array of any data type

Method & Attributes:

Boolean mask and the .loc Indexer:

.loc – The prototypical way of doing label based extraction

Selecting with .get():

Series Methods and Handling:

Unique values and Series Monotonicity:

The count() Method:

Dropping and Filling NAs:

idxmax() & idxmin():

Sorting with sort_values():

Sorting wit sort_index():

Series Arithmetic and fill_value():

Calculating Variance and Standard Deviation:

Pairwise Difference with Diff():

>for i in Alcohol.items(): / for i in Alcohol.iteritems():

Filtering: filter(), where() and mask()

Note: Default replacing value is ‘NaN’

Transforming with update(), apply() & map():

apply() - Applies a transformation to each and every element in a series

>names = [‘Olga’, ‘Andre’, ‘Brian’]

.dtypes: Shows columns name and their datatypes

.ndim & .shape:

More ways for DataFrame:

2. Dict of Dicts (Column-wise)

3. Dict of Series (Column-wise)

4. List of Dicts (Row-wise)

The info() Method:

You might also like