0% found this document useful (0 votes)
4 views4 pages

Data Visualization Ex2 - Ex4

The document outlines three exercises focused on data manipulation using Python and pandas. Exercise 2 involves exploring a dataset with various DataFrame methods, Exercise 3 focuses on extracting important variables and removing unnecessary ones, and Exercise 4 addresses identifying and filling missing values in a dataset. Each exercise includes a clear aim, procedure, and program code, demonstrating practical applications of data handling in Python.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views4 pages

Data Visualization Ex2 - Ex4

The document outlines three exercises focused on data manipulation using Python and pandas. Exercise 2 involves exploring a dataset with various DataFrame methods, Exercise 3 focuses on extracting important variables and removing unnecessary ones, and Exercise 4 addresses identifying and filling missing values in a dataset. Each exercise includes a clear aim, procedure, and program code, demonstrating practical applications of data handling in Python.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Ex:2

Aim:
To explore a dataset using dataframe, info, shape, head, tail, dtypes, describe,
grouping of data in python.

Procedure:
● Go start and search for Python IDLE and open it..
● Open a new python file and create a dataset.
● Type the program given below and save it as [Link]
● Run the program.

Program:

import pandas as pd
print("\n" + "="*50)
print(" DATA EXPLORATION REPORT")
print("="*50)
# 1. Create DataFrame
data = {
"Name": ["Arun", "Beema", "Charles", "Divya", "Elango", "Farhan", "Guna", "Hari"],
"Age": [21, 22, 23, 21, 22, 23, 21, 22],
"City": ["Chennai", "Coimbatore", "Chennai", "Madurai", "Coimbatore", "Chennai",
"Madurai", "Coimbatore"],
"Score": [88, 92, 85, 90, 75, 89, 95, 78]
}
df = [Link](data)

# Function to print section title


def section(title):
print("\n" + "-"*50)
print(title)
print("-"*50)
section("DATAFRAME")
print(df)

print("\n=== INFO ===")


print("Rows :", [Link][0])
print("Cols :", [Link][1])
section("SHAPE (ROWS, COLUMNS)")
print([Link])

section("HEAD (FIRST 5 ROWS)")


print([Link]())

section("TAIL (LAST 5 ROWS)")


print([Link]())

section("DATA TYPES")
print([Link])

section("DESCRIBE")
print([Link]())

section("GROUP BY CITY (MEAN AGE & SCORE)")


print([Link]("City").mean(numeric_only=True))

Result:

Thus, a dataset has been explored using dataframe, info, shape, head, tail,
dtypes, describe, grouping of data in python successfully.

Ex:3

Aim:
To extract important variables and remove useless variables from the dataset.

Procedure:
● Go to start and search for Python IDLE and open it..
● Open a new python file and create a dataset.
● Type the program given below and save it as [Link]
● Run the program.

Program:

import pandas as pd
# Dataset
data = {
'Name': ['Arun', 'Balu', 'Charan', 'Deepa'],
'Age': [21, 22, 20, 23],
'Marks': [85, 90, 78, 88],
'City': ['Erode', 'Karur', 'Coimbatore', 'Chennai'],
'Extra': ['x', 'y', 'z', 'p']
}
df = [Link](data)
print("Original Dataset:")
print(df)

# IMPORTANT VARIABLES
important_columns = ['Name', 'Age', 'Marks']
df_important = df[important_columns]
print("\nImportant Variables Dataset:")
print(df_important)

# REMOVAL OF USELESS VARIABLES


useless_columns = ['Extra']
df_cleaned = [Link](useless_columns, axis=1)
print("\nDataset After Removing Useless Variables:")
print(df_cleaned)

Result:

Thus, an important variables has been extracted and removed useless variables
from the dataset successfully.

Ex:4

Aim:
To identify and fill missing values within the dataset.

Procedure:
● Go to start and search for Python IDLE and open it..
● Open a new python file and create a dataset with a missing value.
● Type the program given below and save it as [Link]
● Run the program.

Program:

import pandas as pd
import numpy as np

# Dataset with missing values


data = {
'Name': ['Arun', 'Balu', 'Cheran', 'Deepa'],
'Age': [21, [Link], 20, 23],
'Marks': [85, 90, [Link], 88],
'City': ['Tirupur', 'Erode', None, 'Chennai']
}
df = [Link](data)
print("Original Dataset:")
print(df)

# 1. IDENTIFY MISSING VALUES


print("\nMissing Values Count in Each Column:")
print([Link]().sum())

# 2. FILL MISSING VALUES


# Fill numeric columns with mean
df['Age'] = df['Age'].fillna(df['Age'].mean())
df['Marks'] = df['Marks'].fillna(df['Marks'].mean())

# Fill text columns with a placeholder


df['City'] = df['City'].fillna("Unknown")
print("\nDataset After Filling Missing Values:")
print(df)

Result:

Thus, missing values in the dataset were successfully identified and filled.

You might also like