Ex:2
Aim:
To explore a dataset using dataframe, info, shape, head, tail, dtypes, describe,
grouping of data in python.
Procedure:
● Go start and search for Python IDLE and open it..
● Open a new python file and create a dataset.
● Type the program given below and save it as [Link]
● Run the program.
Program:
import pandas as pd
print("\n" + "="*50)
print(" DATA EXPLORATION REPORT")
print("="*50)
# 1. Create DataFrame
data = {
"Name": ["Arun", "Beema", "Charles", "Divya", "Elango", "Farhan", "Guna", "Hari"],
"Age": [21, 22, 23, 21, 22, 23, 21, 22],
"City": ["Chennai", "Coimbatore", "Chennai", "Madurai", "Coimbatore", "Chennai",
"Madurai", "Coimbatore"],
"Score": [88, 92, 85, 90, 75, 89, 95, 78]
}
df = [Link](data)
# Function to print section title
def section(title):
print("\n" + "-"*50)
print(title)
print("-"*50)
section("DATAFRAME")
print(df)
print("\n=== INFO ===")
print("Rows :", [Link][0])
print("Cols :", [Link][1])
section("SHAPE (ROWS, COLUMNS)")
print([Link])
section("HEAD (FIRST 5 ROWS)")
print([Link]())
section("TAIL (LAST 5 ROWS)")
print([Link]())
section("DATA TYPES")
print([Link])
section("DESCRIBE")
print([Link]())
section("GROUP BY CITY (MEAN AGE & SCORE)")
print([Link]("City").mean(numeric_only=True))
Result:
Thus, a dataset has been explored using dataframe, info, shape, head, tail,
dtypes, describe, grouping of data in python successfully.
Ex:3
Aim:
To extract important variables and remove useless variables from the dataset.
Procedure:
● Go to start and search for Python IDLE and open it..
● Open a new python file and create a dataset.
● Type the program given below and save it as [Link]
● Run the program.
Program:
import pandas as pd
# Dataset
data = {
'Name': ['Arun', 'Balu', 'Charan', 'Deepa'],
'Age': [21, 22, 20, 23],
'Marks': [85, 90, 78, 88],
'City': ['Erode', 'Karur', 'Coimbatore', 'Chennai'],
'Extra': ['x', 'y', 'z', 'p']
}
df = [Link](data)
print("Original Dataset:")
print(df)
# IMPORTANT VARIABLES
important_columns = ['Name', 'Age', 'Marks']
df_important = df[important_columns]
print("\nImportant Variables Dataset:")
print(df_important)
# REMOVAL OF USELESS VARIABLES
useless_columns = ['Extra']
df_cleaned = [Link](useless_columns, axis=1)
print("\nDataset After Removing Useless Variables:")
print(df_cleaned)
Result:
Thus, an important variables has been extracted and removed useless variables
from the dataset successfully.
Ex:4
Aim:
To identify and fill missing values within the dataset.
Procedure:
● Go to start and search for Python IDLE and open it..
● Open a new python file and create a dataset with a missing value.
● Type the program given below and save it as [Link]
● Run the program.
Program:
import pandas as pd
import numpy as np
# Dataset with missing values
data = {
'Name': ['Arun', 'Balu', 'Cheran', 'Deepa'],
'Age': [21, [Link], 20, 23],
'Marks': [85, 90, [Link], 88],
'City': ['Tirupur', 'Erode', None, 'Chennai']
}
df = [Link](data)
print("Original Dataset:")
print(df)
# 1. IDENTIFY MISSING VALUES
print("\nMissing Values Count in Each Column:")
print([Link]().sum())
# 2. FILL MISSING VALUES
# Fill numeric columns with mean
df['Age'] = df['Age'].fillna(df['Age'].mean())
df['Marks'] = df['Marks'].fillna(df['Marks'].mean())
# Fill text columns with a placeholder
df['City'] = df['City'].fillna("Unknown")
print("\nDataset After Filling Missing Values:")
print(df)
Result:
Thus, missing values in the dataset were successfully identified and filled.