0% found this document useful (0 votes)

13 views41 pages

Module - d2

The document provides an overview of the Python Pandas library, which is used for data manipulation and analysis through its data structures like Series, DataFrame, and Panel. It details key features, methods for creating DataFrames, and various operations such as merging, reshaping, and pivoting data. Additionally, it introduces the concept of Regular Expressions in Python for pattern matching in strings.

Uploaded by

Muhammed adhil

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views41 pages

Module - d2

Uploaded by

Muhammed adhil

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Module - 2

Data Wrangling
Python Pandas
Pandas is an open-source Python Library providing high-performance
data manipulation and analysis tool using its powerful data structures.
Using Pandas, we can accomplish five typical steps in the processing
and analysis of data, regardless of the origin of data — load, prepare,
manipulate, model, and analyze.
Python with Pandas is used in a wide range of fields including academic
and commercial domains including finance, economics, Statistics,
analytics, etc.
Key features of Pandas
• Fast and efficient DataFrame object with default and customized
indexing.
• Tools for loading data into in-memory data objects from different file
formats.
• Data alignment and integrated handling of missing data.
• Reshaping and pivoting of date sets.
• Label-based slicing, indexing and subsetting of large data sets.
• Columns from a data structure can be deleted or inserted.
• Group by data for aggregation and transformations.
• High performance merging and joining of data.
• Time Series functionality.
Data structures
Pandas deals with the following three data structures −

• Series
• DataFrame
• Panel

These data structures are built on top of Numpy array, which means
they are fast.
Data Structure Dimensions Description

1D labeled homogeneous array,

Series 1
sizeimmutable.

General 2D labeled, size-mutable

Data Frames 2 tabular structure with potentially
heterogeneously typed columns.

General 3D labeled, size-mutable

Panel 3
array.
Mutability
All Pandas data structures are value mutable (can be changed) and
except Series all are size mutable. Series is size immutable.

Note − DataFrame is widely used and one of the most important data
structures. Panel is used much less.
Series

Series is a one-dimensional array like structure with homogeneous

data. For example, the following series is a collection of integers 10, 23,
56, …
10 23 56 17 52 61 73 90 26 72

Key Points
Homogeneous data
Size Immutable
Values of Data Mutable
DataFrame

DataFrame is a two-dimensional array with heterogeneous data. For

example,
Name Age Gender Rating
Steve 32 Male 3.45
Lia 28 Female 4.6
Vin 45 Male 3.9
Katie 38 Female 2.78

The table represents the data of a sales team of an organization with

their overall performance rating. The data is represented in rows and
columns. Each column represents an attribute and each row represents
a person.
Data Type of Columns

The data types of the four columns are as follows −

Column Type
Name String
Age Integer
Gender String
Rating Float

Key Points
Heterogeneous data
Size Mutable
Data Mutable
A Data frame is a two-dimensional data structure, i.e., data is aligned in
a tabular fashion in rows and columns.
Features of DataFrame

Potentially columns are of different types

Size – Mutable
Labeled axes (rows and columns)
Can Perform Arithmetic operations on rows and columns
Let us assume that we are creating a data frame with student’s data.
pandas.DataFrame

A pandas DataFrame can be created using the following constructor −

pandas.DataFrame( data, index, columns, dtype, copy)

A pandas DataFrame can be created using various inputs like −

Lists
dict
Series
Numpy ndarrays
Another DataFrame
#import the pandas library and aliasing as pd
import pandas as pd
df = pd.DataFrame()
print(df)

Create a DataFrame from Lists

The DataFrame can be created using a single list or a list of lists.
Example
import pandas as pd
data = [1,2,3,4,5]
df = pd.DataFrame(data)
Print(df)
import pandas as pd
data = [['Alex',10],['Bob',12],['Clarke',13]]
df = pd.DataFrame(data,columns=['Name','Age'])
print df

import pandas as pd
data = [['Alex',10],['Bob',12],['Clarke',13]]
df = pd.DataFrame(data,columns=['Name','Age'],dtype=float)
print df
Create a DataFrame from Dict of ndarrays / Lists
All the ndarrays must be of same length. If index is passed, then the
length of the index should equal to the length of the arrays.

If no index is passed, then by default, index will be range(n), where n is

the array length.
import pandas as pd
data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':[28,34,29,42]}
df = pd.DataFrame(data)
print df
Let us now create an indexed DataFrame
import pandas as pd
data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':[28,34,29,42]}
df = pd.DataFrame(data, index=['rank1','rank2','rank3','rank4'])
print df
Python Pandas - Merging/Joining
Pandas has full-featured, high performance in-memory join operations
idiomatically very similar to relational databases like SQL.
Pandas provides a single function, merge, as the entry point for all
standard database join operations between DataFrame objects −

pd.merge(left, right, how='inner', on=None, left_on=None,

right_on=None, left_index=False, right_index=False, sort=True)
left − A DataFrame object.
right − Another DataFrame object.
on − Columns (names) to join on. Must be found in both the left and right DataFrame objects.
left_on − Columns from the left DataFrame to use as keys. Can either be column names or arrays
with length equal to the length of the DataFrame.
right_on − Columns from the right DataFrame to use as keys. Can either be column names or
arrays with length equal to the length of the DataFrame.
left_index − If True, use the index (row labels) from the left DataFrame as its join key(s). In case of
a DataFrame with a MultiIndex (hierarchical), the number of levels must match the number of join
keys from the right DataFrame.
right_index − Same usage as left_index for the right DataFrame.
how− One of 'left', 'right', 'outer', 'inner'. Defaults to inner
sort − Sort the result DataFrame by the join keys in lexicographical order. Defaults to True, setting
to False will improve the performance substantially in many cases.
Let us now create two different DataFrames and perform the merging
operations on it.
# import the pandas library
import pandas as pd
left = pd.DataFrame({
'id':[1,2,3,4,5],
'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
'subject_id':['sub1','sub2','sub4','sub6','sub5']})
right = pd.DataFrame(
{'id':[1,2,3,4,5],
'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
'subject_id':['sub2','sub4','sub3','sub6','sub5']})
Print(left)
print(right)
Merge Two DataFrames on a Key
import pandas as pd
left = pd.DataFrame({
'id':[1,2,3,4,5],
'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
'subject_id':['sub1','sub2','sub4','sub6','sub5']})
right = pd.DataFrame({
'id':[1,2,3,4,5],
'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
'subject_id':['sub2','sub4','sub3','sub6','sub5']})
print pd.merge(left,right,on='id')
Merge Two DataFrames on Multiple Keys
import pandas as pd
left = pd.DataFrame({
'id':[1,2,3,4,5],
'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
'subject_id':['sub1','sub2','sub4','sub6','sub5']})
right = pd.DataFrame({
'id':[1,2,3,4,5],
'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
'subject_id':['sub2','sub4','sub3','sub6','sub5']})
print pd.merge(left,right,on=['id','subject_id'])
Merge Using 'how' Argument
The how argument to merge specifies how to determine which keys are to be
included in the resulting table. If a key combination does not appear in either the
left or the right tables, the values in the joined table will be NA.

Here is a summary of the how options and their SQL equivalent names −

Merge Method SQL Equivalent Description

left LEFT OUTER JOIN Use keys from left object
right RIGHT OUTER JOIN Use keys from right object
outer FULL OUTER JOIN Use union of keys
inner INNER JOIN Use intersection of keys
Left Join
import pandas as pd
left = pd.DataFrame({
'id':[1,2,3,4,5],
'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
'subject_id':['sub1','sub2','sub4','sub6','sub5']})
right = pd.DataFrame({
'id':[1,2,3,4,5],
'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
'subject_id':['sub2','sub4','sub3','sub6','sub5']})
print pd.merge(left, right, on='subject_id', how='left')

Its output is as follows −

Name_x id_x subject_id Name_y id_y
0 Alex 1 sub1 NaN NaN
1 Amy 2 sub2 Billy 1.0
2 Allen 3 sub4 Brian 2.0
3 Alice 4 sub6 Bryce 4.0
4 Ayoung 5 sub5 Betty 5.0
Right Join
import pandas as pd
left = pd.DataFrame({
'id':[1,2,3,4,5],
'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
'subject_id':['sub1','sub2','sub4','sub6','sub5']})
right = pd.DataFrame({
'id':[1,2,3,4,5],
'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
'subject_id':['sub2','sub4','sub3','sub6','sub5']})
print pd.merge(left, right, on='subject_id', how='right')

Its output is as follows −

Name_x id_x subject_id Name_y id_y

0 Amy 2.0 sub2 Billy 1
1 Allen 3.0 sub4 Brian 2
2 Alice 4.0 sub6 Bryce 4
3 Ayoung 5.0 sub5 Betty 5
4 NaN NaN sub3 Bran 3
Outer Join
import pandas as pd
left = pd.DataFrame({
'id':[1,2,3,4,5],
'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
'subject_id':['sub1','sub2','sub4','sub6','sub5']})
right = pd.DataFrame({
'id':[1,2,3,4,5],
'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
'subject_id':['sub2','sub4','sub3','sub6','sub5']})
print pd.merge(left, right, how='outer', on='subject_id')

Its output is as follows −

Name_x id_x subject_id Name_y id_y
0 Alex 1.0 sub1 NaN NaN
1 Amy 2.0 sub2 Billy 1.0
2 Allen 3.0 sub4 Brian 2.0
3 Alice 4.0 sub6 Bryce 4.0
4 Ayoung 5.0 sub5 Betty 5.0
5 NaN NaN sub3 Bran 3.0
Inner Join
Joining will be performed on index. Join operation honors the object on which it is called.
So, a.join(b) is not equal to b.join(a).
import pandas as pd
left = pd.DataFrame({
'id':[1,2,3,4,5],
'Name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
'subject_id':['sub1','sub2','sub4','sub6','sub5']})
right = pd.DataFrame({
'id':[1,2,3,4,5],
'Name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
'subject_id':['sub2','sub4','sub3','sub6','sub5']})
print pd.merge(left, right, on='subject_id', how='inner')
Its output is as follows −
Name_x id_x subject_id Name_y id_y
0 Amy 2 sub2 Billy 1
1 Allen 3 sub4 Brian 2
2 Alice 4 sub6 Bryce 4
3 Ayoung 5 sub5 Betty 5
Reshaping/Pivoting
Python has operations for rearranging tabular data, known as
reshaping or pivoting operations. For example, hierarchical indexing
provides a consistent way to rearrange data in a DataFrame. There are
two primary functions in hierarchical indexing: stack(): rotates or pivots
data from columns to rows.
The pivot() function is used to reshape a given DataFrame organized by
given index / column values.
We can reshape a dataframe using melt(),stack(),unstack() and pivot()
function.
Define a dataframe.
Apply melt() function to convert wide dataframe column as rows. It is
defined below,
df.melt()
Example
import pandas as pd
df = pd.DataFrame({'Id':[1,2,3],'Age':[13,14,13],'Mark':[80,90,85]})
print("Dataframe is:\n",df)
print(df.melt())
Define a dataframe.
Apply stack() function to increase the level of the index in a dataframe. It is
defined below,
df.stack().to_frame()
If you want to revert back the changes, you can use unstack().
df.unstack().to_frame()
to_frame() function is used to convert the given series object to a dataframe.
Example
import pandas as pd
df = pd.DataFrame({'Id':[1,2,3],'Age':[13,14,13],'Mark':[80,90,85]})
print("Dataframe is:\n",df)
print(df.stack().to_frame())
print(df.unstack().to_frame())
Define a dataframe
Apply pivot() function to reshape a dataframe based on Id column,
df.pivot(columns='Id')
Example
import pandas as pd
df = pd.DataFrame({'Id':[1,2,3],'Age':[13,14,13],'Mark':[80,90,85]})
print("Dataframe is:\n",df)
print(df.pivot(columns='Id'))
import pandas as pd
#Create a DataFrame
d = { 'countries':['A','B','C','A','B','C'],
'metrics':['population_in_million','population_in_million','population_in_million
', 'gdp_percapita','gdp_percapita','gdp_percapita'],
'values':[100,200,120,2000,7000,15000] }
df = pd.DataFrame(d,columns=['countries','metrics','values'])
print(df)
df2=df.pivot(index='countries', columns='metrics', values='values')
print(df2)

Pivot function() reshapes the data from long to wide in Pandas python.
Countries column is used on index.
Values of Metrics column is used as column names and values of value
column is used as its value.
Python RegEx
A RegEx, or Regular Expression, is a sequence of characters that forms a
search pattern.

RegEx can be used to check if a string contains the specified search pattern.
RegEx Module

Python has a built-in package called re, which can be used to work with
Regular Expressions.

Import the re module:

import re
Search the string to see if it starts with "The" and ends with "Spain":
import re
txt = "The rain in Spain"
x = re.search("^The.*Spain$", txt)
^ beginning
$ end
. Any character
*
RegEx Functions
The re module offers a set of functions that allows us to search a string
for a match:
Function Description
findall Returns a list containing all matches
search Returns a Match object if there is a match anywhere in the
string
split Returns a list where the string has been split at each
match
sub Replaces one or many matches with a string
Metacharacters
Metacharacters are characters with a special meaning:
import re
txt = "The rain in Spain"
#Find all lower case characters alphabetically between "a" and "m":
x = re.findall("[a-m]", txt)
print(x)

import re
txt = "That will be 59 dollars"
#Find all digit characters:
x = re.findall("\d", txt)
print(x)

import re
txt = "hello planet"
#Search for a sequence that starts with "he", followed by two (any) characters, and an "o":
x = re.findall("he..o", txt)
print(x)
import re
txt = "hello planet"
#Check if the string starts with 'hello':
x = re.findall("^hello", txt)
if x:
print("Yes, the string starts with 'hello'")
else:
print("No match")

import re
txt = "hello planet"
#Check if the string ends with 'planet':
x = re.findall("planet$", txt)
if x:
print("Yes, the string ends with 'planet'")
else:
print("No match")
import re
txt = "hello planet"
#Search for a sequence that starts with "he", followed by 0 or more (any) characters, and an "o":
x = re.findall("he.*o", txt)
print(x)

import re
txt = "hello planet"
#Search for a sequence that starts with "he", followed by 1 or more (any) characters, and an "o":
x = re.findall("he.+o", txt)
print(x)

import re
txt = "hello planet"
#Search for a sequence that starts with "he", followed by 0 or 1 (any) character, and an "o":
x = re.findall("he.?o", txt)
print(x)
#This time we got no match, because there were not zero, not one, but two characters between
"he" and the "o"
import re
txt = "hello planet"
#Search for a sequence that starts with "he", followed excactly 2 (any) characters,
and an "o":
x = re.findall("he.{2}o", txt)
print(x)

import re
txt = "The rain in Spain falls mainly in the plain!"
#Check if the string contains either "falls" or "stays":
x = re.findall("falls|stays", txt)
print(x)
if x:
print("Yes, there is at least one match!")
else:
print("No match")
Special Sequences
A special sequence is a \ followed by one of the characters
Sets
A set is a set of characters inside a pair of square brackets [] with a special meaning:

Ch-2 - Panda - Part-1 - 2nd - Day
No ratings yet
Ch-2 - Panda - Part-1 - 2nd - Day
4 pages
OOM Unit 2
No ratings yet
OOM Unit 2
145 pages
EDA Lecture 7 - 9
No ratings yet
EDA Lecture 7 - 9
7 pages
Data Wrangling with Pandas
No ratings yet
Data Wrangling with Pandas
16 pages
DSP Unit-5 Updated
No ratings yet
DSP Unit-5 Updated
23 pages
Pandas DataFrame Cheat Sheet
No ratings yet
Pandas DataFrame Cheat Sheet
6 pages
Pandas Cheat Sheet
100% (1)
Pandas Cheat Sheet
2 pages
Pandas Data Wrangling Cheatsheet Datacamp PDF
No ratings yet
Pandas Data Wrangling Cheatsheet Datacamp PDF
1 page
Top Machine Learning Artificial Intelligence AI Data Science Cheat Sheets ForML & Deep Learning Engineers
No ratings yet
Top Machine Learning Artificial Intelligence AI Data Science Cheat Sheets ForML & Deep Learning Engineers
14 pages
Unit 4 1
No ratings yet
Unit 4 1
3 pages
Pandas DataFrame Join Techniques
No ratings yet
Pandas DataFrame Join Techniques
29 pages
Pandas DataFrame Operations Guide
No ratings yet
Pandas DataFrame Operations Guide
35 pages
Introduction to Pandas DataFrames
100% (1)
Introduction to Pandas DataFrames
21 pages
Pandas Data Structures and Operations
No ratings yet
Pandas Data Structures and Operations
36 pages
Data Frame Demo
No ratings yet
Data Frame Demo
73 pages
Python Lecture 5 (2025)
No ratings yet
Python Lecture 5 (2025)
29 pages
07 Data Wrangling
No ratings yet
07 Data Wrangling
51 pages
Revision Notes DataFrame XII IP
No ratings yet
Revision Notes DataFrame XII IP
8 pages
Unit 4 DSE
No ratings yet
Unit 4 DSE
9 pages
Pandas Cheat Sheet
85% (13)
Pandas Cheat Sheet
2 pages
Pandas Cheat Sheet CN
No ratings yet
Pandas Cheat Sheet CN
4 pages
Pandas Cheat Sheet
100% (4)
Pandas Cheat Sheet
2 pages
Python Data Science Cheat Sheet
97% (33)
Python Data Science Cheat Sheet
11 pages
Edp 3
No ratings yet
Edp 3
16 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
2 pages
UNIT IV Material
No ratings yet
UNIT IV Material
23 pages
Chapter 2 Python Pandas - II
No ratings yet
Chapter 2 Python Pandas - II
19 pages
Learn Pandas
No ratings yet
Learn Pandas
37 pages
Pandas Data Wrangling Cheat Sheet
100% (2)
Pandas Data Wrangling Cheat Sheet
6 pages
Exp 3
No ratings yet
Exp 3
10 pages
Pandas
No ratings yet
Pandas
44 pages
Python Pandas-Data Frames
No ratings yet
Python Pandas-Data Frames
41 pages
4th Unit Answer Bank
No ratings yet
4th Unit Answer Bank
40 pages
Panda - Ipynb - Colab
No ratings yet
Panda - Ipynb - Colab
1 page
Introduction To Pandas in Data Analytics
No ratings yet
Introduction To Pandas in Data Analytics
12 pages
Praveen PPT
No ratings yet
Praveen PPT
9 pages
Pandas - Ipynb - Colab
No ratings yet
Pandas - Ipynb - Colab
8 pages
B "Hello, World!" Print (B (2:5) ) Llo
No ratings yet
B "Hello, World!" Print (B (2:5) ) Llo
52 pages
Week 2
No ratings yet
Week 2
6 pages
Pandas Indexing and Data Handling
No ratings yet
Pandas Indexing and Data Handling
44 pages
UnitIV 1
No ratings yet
UnitIV 1
4 pages
Content Pandas Cheat Sheet
No ratings yet
Content Pandas Cheat Sheet
9 pages
Python For DS Unit4
No ratings yet
Python For DS Unit4
11 pages
Fundamental - Python
No ratings yet
Fundamental - Python
3 pages
DataFrame 2
No ratings yet
DataFrame 2
38 pages
Python 2.1.3
No ratings yet
Python 2.1.3
6 pages
Pandas
No ratings yet
Pandas
8 pages
Data Science Data Manipulation With Pandas
No ratings yet
Data Science Data Manipulation With Pandas
77 pages
Python Pandas Data Structuring Guide
No ratings yet
Python Pandas Data Structuring Guide
5 pages
VMware vSAN Specialist Exam Guide
No ratings yet
VMware vSAN Specialist Exam Guide
36 pages
IIMT2601 24 s1 Lab Exceltopic Instructions
No ratings yet
IIMT2601 24 s1 Lab Exceltopic Instructions
3 pages
2024-BCA-CBCS Sem 3 4 5 6 UNIT
No ratings yet
2024-BCA-CBCS Sem 3 4 5 6 UNIT
15 pages
GroupA Syllabus OCT2024
No ratings yet
GroupA Syllabus OCT2024
12 pages
Collections in Java
No ratings yet
Collections in Java
12 pages
AMS 315 Multiple Regression Project
No ratings yet
AMS 315 Multiple Regression Project
4 pages
AI-Driven Malware Analysis Insights
No ratings yet
AI-Driven Malware Analysis Insights
12 pages
IRIS Orca Series Datasheet
No ratings yet
IRIS Orca Series Datasheet
28 pages
DevOps Engineer Resume of Shaik Sohail
No ratings yet
DevOps Engineer Resume of Shaik Sohail
2 pages
CLB94 Digital Converter for Industry
No ratings yet
CLB94 Digital Converter for Industry
6 pages
03-Set Up GitHub Copilot in VS Code
No ratings yet
03-Set Up GitHub Copilot in VS Code
7 pages
Engineering Students' Syllabus
No ratings yet
Engineering Students' Syllabus
101 pages
Teacher: CSC 330 E-Commerce
No ratings yet
Teacher: CSC 330 E-Commerce
64 pages
GATE CS 2025 Syllabus
No ratings yet
GATE CS 2025 Syllabus
2 pages
CSE Network Lab Exam Winter 2020-21
No ratings yet
CSE Network Lab Exam Winter 2020-21
3 pages
Somhost Business Plan-Last - Update
No ratings yet
Somhost Business Plan-Last - Update
18 pages
Dr.C.govindaraju-GCE Salem Year Wise Publications
No ratings yet
Dr.C.govindaraju-GCE Salem Year Wise Publications
10 pages
Checkout Process and Product Links
No ratings yet
Checkout Process and Product Links
4 pages
9.2.9 Packet Tracer - Examine The ARP Table
No ratings yet
9.2.9 Packet Tracer - Examine The ARP Table
4 pages
File - Lua
No ratings yet
File - Lua
5 pages
Cloud Iot Exp-01
No ratings yet
Cloud Iot Exp-01
5 pages
Pneumonia Dataset
No ratings yet
Pneumonia Dataset
2 pages
UNIT 1 Question Bank
No ratings yet
UNIT 1 Question Bank
4 pages
Aspiring Developer Seeking Opportunities
No ratings yet
Aspiring Developer Seeking Opportunities
1 page
Iso 12647 7 2016
No ratings yet
Iso 12647 7 2016
12 pages
MSJ Adminguide PDF
100% (1)
MSJ Adminguide PDF
494 pages
CSS Session Plan
No ratings yet
CSS Session Plan
12 pages
Mypredict Myvcar
No ratings yet
Mypredict Myvcar
11 pages
Floating Point Representation
No ratings yet
Floating Point Representation
26 pages
Compiler Design and Architecture
No ratings yet
Compiler Design and Architecture
53 pages

Module - d2

Uploaded by

Module - d2

Uploaded by

Module - 2

1D labeled homogeneous array,

General 2D labeled, size-mutable

General 3D labeled, size-mutable

Series is a one-dimensional array like structure with homogeneous

DataFrame is a two-dimensional array with heterogeneous data. For

The table represents the data of a sales team of an organization with

The data types of the four columns are as follows −

Potentially columns are of different types

A pandas DataFrame can be created using the following constructor −

pandas.DataFrame( data, index, columns, dtype, copy)

A pandas DataFrame can be created using various inputs like −

Create a DataFrame from Lists

If no index is passed, then by default, index will be range(n), where n is

pd.merge(left, right, how='inner', on=None, left_on=None,

Merge Method SQL Equivalent Description

Its output is as follows −

Its output is as follows −

Name_x id_x subject_id Name_y id_y

Its output is as follows −

Import the re module:

You might also like