Day 4: Data Manipulation with Pandas
Introduction to Pandas: Pandas is a powerful Python library for data manipulation and
analysis. It provides data structures like Series and DataFrame, which are ideal for handling
structured data.
# Example of importing Pandas
import pandas as pd
# Creating a DataFrame from a dictionary
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 35, 40],
'City': ['New York', 'Los Angeles', 'Chicago', 'Houston']}
df = pd.DataFrame(data)
print(df)
DataFrames: Creation, Indexing, and Selection: DataFrames are two-dimensional labeled
data structures with columns of potentially different types. Indexing and selection operations
allow you to access specific rows and columns of a DataFrame.
# Example of indexing and selection in Pandas DataFrame
print(df['Name']) # Selecting a single column
print(df[['Name', 'Age']]) # Selecting multiple columns
print(df.iloc[0]) # Selecting a single row by index
print(df.loc[df['City'] == 'New York']) # Selecting rows based on a
condition
Data Cleaning: Handling Missing Data, Data Transformation: Pandas provides methods
for handling missing data, such as dropping or filling missing values. It also supports various
data transformation operations like merging, reshaping, and aggregating data.
# Example of handling missing data and data transformation
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, None, 35, 40],
'City': ['New York', 'Los Angeles', 'Chicago', 'Houston']}
df = pd.DataFrame(data)
print(df.dropna()) # Drop rows with missing values
print(df.fillna(0)) # Fill missing values with a specified value
Output:
Name Age City
0 Alice 25 New York
2 Charlie 35 Chicago
3 David 40 Houston
Name Age City
0 Alice 25.0 New York
1 Bob 0.0 Los Angeles
2 Charlie 35.0 Chicago
3 David 40.0 Houston
Pandas is an essential tool for data manipulation and analysis in Python, and mastering its
usage is crucial for working with structured datasets effectively.
Day 4: Data Manipulation with Pandas
Introduction to Pandas
Pandas:
Powerful library for data manipulation and analysis
Built on top of Numpy
Install with pip install pandas
Importing Pandas:
python
Copy code
import pandas as pd
DataFrames: Creation, Indexing, and Selection
Creating DataFrames:
From a dictionary:
python
Copy code
data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(data)
From a CSV file:
python
Copy code
df = pd.read_csv('data.csv')
Indexing:
Default index starts at 0
Setting a custom index:
python
Copy code
df.set_index('Name', inplace=True)
Selection:
Selecting columns:
python
Copy code
df['Age']
df[['Name', 'City']]
Selecting rows:
python
Copy code
df.iloc[0] # By position
df.loc['Alice'] # By index
Conditional selection:
python
Copy code
df[df['Age'] > 30]
Data Cleaning: Handling Missing Data, Data
Transformation
Handling Missing Data:
Identifying missing data:
python
Copy code
df.isnull().sum()
Dropping missing data:
python
Copy code
df.dropna(inplace=True)
Filling missing data:
python
Copy code
df.fillna(value=0, inplace=True)
Data Transformation:
Adding new columns:
python
Copy code
df['Age_in_10_years'] = df['Age'] + 10
Applying functions:
python
Copy code
df['Age_squared'] = df['Age'].apply(lambda x: x**2)
Example:
python
Copy code
import pandas as pd
# Creating a DataFrame
data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, None],
'City': ['New York', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(data)
# Handling missing data
df['Age'].fillna(df['Age'].mean(), inplace=True)
# Data transformation
df['Age_in_10_years'] = df['Age'] + 10
print(df)
This concludes the note for Day 4: Data Manipulation with Pandas.