day09-pandas-data-manipulation
February 2, 2024
Pandas Data Manipulation –by Punith V T
[1]: import pandas as pd
[58]: #sample data
data = { "A" : [1,2,3,4,5,10],
"B": ["bengaluru","channai","delhi","tumkur","coimbator","bengaluru"]}
df=pd.DataFrame(data)
df
[58]: A B
0 1 bengaluru
1 2 channai
2 3 delhi
3 4 tumkur
4 5 coimbator
5 10 bengaluru
1. Filtering Data: Filtering rows based on a condition.
[12]: gr2= df[df["A"]>2]
gr2
[12]: A B
2 3 delhi
3 4 tumkur
4 5 coimbator
2. Selecting Columns:
Selecting specific columns from a DataFrame.
[20]: sc = df[["B"]]
print(sc)
type(sc)
1
B
0 bengaluru
1 channai
2 delhi
3 tumkur
4 coimbator
[20]: pandas.core.frame.DataFrame
3.Sorting Data:
Sorting DataFrame by one or more columns.
[29]: sort = df.sort_values(by="B")
sort
[29]: A B
0 1 bengaluru
1 2 channai
4 5 coimbator
2 3 delhi
3 4 tumkur
[31]: #descending order
desort = df.sort_values(by="B")
desort
[31]: A B
0 1 bengaluru
1 2 channai
4 5 coimbator
2 3 delhi
3 4 tumkur
4. Aggregating Data:
Calculating summary statistics like mean, sum, count, etc.
[45]: meanA =df["A"].mean()
meanA
[45]: 3.0
[46]: valueCountA =df["A"].value_counts()
valueCountA
[46]: A
1 1
2 1
2
3 1
4 1
5 1
Name: count, dtype: int64
5. Handling Missing Data:
Dealing with missing values in your DataFrame
[51]: import pandas as pd
# sample
data_with_missing ={ "A": [1,2,3,None,4],
"B": ["a","b",None,"d","e"]}
df_miss=pd.DataFrame(data_with_missing)
df_miss.dropna()
[51]: A B
0 1.0 a
1 2.0 b
4 4.0 e
[52]: # Create two DataFrames
df1 = pd.DataFrame({'key': ['A', 'B', 'C'], 'value1': [10, 20, 30]})
df2 = pd.DataFrame({'key': ['B', 'C', 'D'], 'value2': [40, 50, 60]})
# Merge based on 'key' column
merged_df = pd.merge(df1, df2, on='key', how='inner')
print(merged_df)
key value1 value2
0 B 20 40
1 C 30 50
7. Grouping and Aggregating Data:
Grouping data by one or more columns and applying aggregate functions.
[62]: # Group by 'B' and calculate the sum of 'A' for each group
group_df = df.groupby('B')["A"].sum().reset_index()
print(group_df)
B A
0 bengaluru 11
1 channai 2
2 coimbator 5
3 delhi 3
4 tumkur 4
8. Pivot Tables:
3
Creating pivot tables to summarize and reshape data.
[63]: # Create a pivot table to show the mean 'A' for each 'B' category
pivot_table = df.pivot_table(values='A', index='B', aggfunc='mean')
print(pivot_table)
A
B
bengaluru 5.5
channai 2.0
coimbator 5.0
delhi 3.0
tumkur 4.0
9. Combining Data:
Concatenating or appending multiple DataFrames vertically or horizontally.
[65]: # Concatenate two DataFrames Horizontaly
df_concatenated = pd.concat([df1, df2], axis=1)
print(df_concatenated)
key value1 key value2
0 A 10 B 40
1 B 20 C 50
2 C 30 D 60
[69]: # Append one DataFrame to another
df_appended = df1._append(df2,ignore_index=True)
print(df_appended)
key value1 value2
0 A 10.0 NaN
1 B 20.0 NaN
2 C 30.0 NaN
3 B NaN 40.0
4 C NaN 50.0
5 D NaN 60.0
Applying function to the data
[70]: def square(x):
return x*x
# Apply the custom function to 'A' column
df["sq_A"]= df["A"].apply(square)
df
[70]: A B sq_A
0 1 bengaluru 1
1 2 channai 4
4
2 3 delhi 9
3 4 tumkur 16
4 5 coimbator 25
5 10 bengaluru 100
[ ]: