Experiment – 6: Perform following operations using pandas
Creating dataframe
Concat()
Setting conditions
Adding a new column
Creating dataframe
Pandas DataFrames are two-dimensional, tabular data structures with labeled axes (rows and
columns). Here are several ways to create them:
Program – 1: From a Dictionary
import pandas as pd
data = {
"names": ['A', 'B', 'C'],
"rollno": [101, 102, 103]
}
df = pd.DataFrame(data)
print(df)
Output:
names rollno
0 A 50
1 B 40
2 C 45
Program – 2: From a json
import pandas as pd
data = {
"names":{
"0":"A",
"1":"B",
"2":"C",
},
" rollno":{
"0":101,
"1":102,
"2":103,
}
}
df = pd.DataFrame(data)
print(df)
Output:
names rollno
0 A 50
1 B 40
2 C 45
Program – 3: From a NumPy Array
import numpy as np
import pandas as pd
data = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
# Changing row labels(index’s) and adding column labels
df = pd.DataFrame(data,index=["I","II","III"], columns=['A', 'B', 'C'])
print(df)
Output:
A B C
I 1 2 3
II 4 5 6
III 7 8 9
Concat()
The pd.concat() function is used to concatenate (combine) pandas objects (DataFrames or Series)
along a particular axis (rows or columns). It's one of the primary tools for combining data in pandas.
Program -1: Vertical Concatenation (Stacking DataFrames)
# Concatenating DataFrames with Same Columns
import pandas as pd
d1 = {
'A': [1, 2, 3],
'B': [4, 5, 6]
}
d2 = {
'A': [7, 8, 9],
'B': [10, 11, 12]
}
df1 = pd.DataFrame(d1)
df2 = pd.DataFrame(d2)
result = pd.concat([df1, df2])
print(result)
Output:
A B
0 1 4
1 2 5
2 3 6
0 7 10
1 8 11
2 9 12
Program -2 : Avoiding duplicate indexes
import pandas as pd
d1 = {
'A': [1, 2, 3],
'B': [4, 5, 6]
}
d2 = {
'A': [7, 8, 9],
'B': [10, 11, 12]
}
df1 = pd.DataFrame(d1)
df2 = pd.DataFrame(d2)
result = pd.concat([df1, df2],ignore_index=True)
print(result)
Output:
A B
0 1 4
1 2 5
2 3 6
3 7 10
4 8 11
5 9 12
Program -3 : Avoiding duplicate indexes
import pandas as pd
d1 = {
'A': [1, 2, 3],
'B': [4, 5, 6]
}
d2 = {
'A': [7, 8, 9],
'B': [10, 11, 12]
}
df1 = pd.DataFrame(d1,index=[0, 1, 2])
df2 = pd.DataFrame(d2,index=[3, 4, 5])
result = pd.concat([df1, df2])
print(result)
Output:
A B
0 1 4
1 2 5
2 3 6
3 7 10
4 8 11
5 9 12
Program – 4:
# Concatenating DataFrames with Different Column Names
import pandas as pd
d1 = {
'A': [1, 2, 3],
'B': [4, 5, 6]
}
d2 = {
'C': [7, 8, 9],
'D': [10, 11, 12]
}
df1 = pd.DataFrame(d1)
df2 = pd.DataFrame(d2)
result = pd.concat([df1, df2],ignore_index=True) # Handling Duplicate Index Values
print(result)
Output:
A B C D
0 1.0 4.0 NaN NaN
1 2.0 5.0 NaN NaN
2 3.0 6.0 NaN NaN
3 NaN NaN 7.0 10.0
4 NaN NaN 8.0 11.0
5 NaN NaN 9.0 12.0
Program – 5: # Concatenating DataFrames Horizontally
import pandas as pd
d1 = {
'A': [1, 2, 3],
'B': [4, 5, 6]
}
d2 = {
'C': [7, 8, 9],
'D': [10, 11, 12]
}
df1 = pd.DataFrame(d1)
df2 = pd.DataFrame(d2)
result = pd.concat([df1, df2],axis=1)
print(result)
Output:
A B C D
0 1 4 7 10
1 2 5 8 11
2 3 6 9 12
Setting conditions:
Pandas provides several ways to set conditions to filter or modify DataFrames and Series. Here are
the main methods:
Program:
1.Boolean Indexing: The most common way to filter data based on conditions:
Program:
import pandas as pd
data = {
'A': [1, 2, 3, 4],
'B': ['a', 'b', 'c', 'd'],
'C': [10, 20, 30, 40]
}
# Create a sample DataFrame
df = pd.DataFrame(data)
print("Original Data\n",df)
# Filter rows where column A > 2
r = df[df['A'] > 2]
print("Simple Condition:\n",f)
# Multiple conditions (use & for AND, | for OR, ~ for NOT)
r1 = df[(df['A'] > 1) & (df['C'] < 40)]
print("Compound Condition:\n",f1)
Output:
Original Data
A B C
0 1 a 10
1 2 b 20
2 3 c 30
3 4 d 40
Simple Condition:
A B C
2 3 c 30
3 4 d 40
Compound Condition:
A B C
1 2 b 20
2 3 c 30
2. query() Method
Filter using a query string:
result = df.query('A > 2 and C != 40')
result
Output:
A B C
A B C
2 3 c 30
3. where() Method
Keeps values where condition is True, replaces others with NaN (or specified value):
# Replace values not meeting condition with NaN
result = df.where(df['A'] > 2)
result
Output:
A B C
0 NaN NaN NaN
1 NaN NaN NaN
2 3.0 c 30.0
3 4.0 d 40.0
# Replace with a specific value
result = df.where(df['A'] > 2, other=-1)
result
Output:
A B C
0 -1 -1 -1
1 -1 -1 -1
2 3 c 30
3 4 d 40
4. loc[] for Conditional Selection and Assignment
Select or modify data based on conditions:
# Select data
result = df.loc[df['A'] > 2, ['B', 'C']]
result
Output:
B C
2 c 30
3 d 40
# Modify data based on condition
df.loc[df['A'] > 2, 'C'] = 99
df
Output:
A B C
A B C
0 1 a 10
1 2 b 20
2 3 c 99
3 4 d 99
5. String Conditions
For string operations:
# Rows where column B starts with 'a'
result = df[df['B'].str.startswith('a')]
result
Output:
A B C
0 1 a 10
# Rows where column B ends with 'd'
result = df[df['B'].str.startswith('d')]
result
Output:
A B C
3 4 d 99
# Contains pattern
result = df[df['B'].str.contains('b|c')]
result
Output:
A B C
1 2 b 20
2 3 c 99
Adding a new column:
There are several ways to add a new column to a pandas DataFrame. Here are the most common
methods:
1. Direct Assignment:
The simplest way to add a new column:
Program:
import pandas as pd
data = {
'A': [1, 2, 3],
'B': [4, 5, 6]
}
df = pd.DataFrame(data)
# Add new column 'C' with a scalar value
df['C'] = 10 # All rows get value 10
df
Output:
A B C
0 1 4 10
1 2 5 10
2 3 6 10
# Add new column with a list/array
df['D'] = [7, 8, 9]
df
Output:
A B C D
0 1 4 10 7
1 2 5 10 8
2 3 6 10 9
# Add column based on calculation
df['E'] = df['A'] + df['B']
df
Output:
A B C D E
0 1 4 10 7 5
1 2 5 10 8 7
2 3 6 10 9 9
2. Insert at Specific Position
# Insert column 'I' at position 1 (0-based index)
df.insert(1, 'I', [100, 200, 300])
df
Output:
A I B C D E
0 1 100 4 10 7 5
1 2 200 5 10 8 7
2 3 300 6 10 9 9
3. Adding Columns from Other DataFrames
df2 = pd.DataFrame({'M': [11, 12, 13]})
df['M'] = df2['M'] # Must have same length/index
df
Output:
A I B C D E M
0 1 100 4 10 7 5 11
1 2 200 5 10 8 7 12
2 3 300 6 10 9 9 13
Experiment – 7: Perform following operations using pandas
• Filling NaN with string
• Sorting based on column values
• groupby()
Filling NaN with string:
Important Points:
1. After filling NaN with strings, numeric columns will be converted to object dtype
2. Use inplace=True parameter to modify the original DataFrame instead of creating a copy
3. For mixed-type columns, be careful about type consistency
4. Consider using pd.NA instead of np.nan for newer pandas versions with string data
There are several ways to replace NaN (missing) values with strings in a pandas DataFrame. Here
are the most common methods:
1. Using fillna() Method:
Program:
import pandas as pd
data = {
'A': [1, 2, np.nan, 4],
'B': ['x', np.nan, 'y', np.nan],
'C': [np.nan, 'a', 'b', 'c']
}
# Create sample DataFrame with NaN values
df = pd.DataFrame(data)
print(df)
# Fill all NaN values with a string
result = df.fillna('missing')
result
Output:
A B C
0 1.0 x NaN
1 2.0 NaN a
2 NaN y b
3 4.0 NaN c