Experiment No.
10
Program to implement functions of Pandas Library
Name/Roll No. : shruti
Class: SE-2
Date of Performance:21/3
Date of Submission:28/3
Experiment No. 10
Title: Program to implement functions of Pandas Library
Aim: Program to implement functions of Pandas Library
Objective: To introduce Pandas package for python
Theory:
Pandas is a powerful and flexible Python library used for data manipulation, analysis, and
cleaning. It is built on top of NumPy and provides high-performance data structures and
functions for working with structured data.
The two primary data structures in Pandas are:
Series: A one-dimensional labeled array capable of holding any data type, similar to a column in
a spreadsheet or a database.
DataFrame: A two-dimensional labeled data structure, akin to a table with rows and columns. It
is the most commonly used structure in Pandas for handling tabular data.
Pandas provides a variety of functions and tools, including:
Data Import/Export: Supports reading from and writing to various file formats such as CSV,
Excel, SQL, and JSON.
Data Cleaning: Offers methods for handling missing data, removing duplicates, and applying
transformations.
Data Manipulation: Includes operations like filtering, grouping, merging, and pivoting datasets.
Data Analysis: Facilitates statistical analysis, descriptive statistics, and time-series operations.
Pandas is widely used in data science, machine learning, and analytics for its ability to handle
large datasets efficiently and provide intuitive tools for data exploration and preprocessing. Its
seamless integration with libraries like Matplotlib and NumPy makes it a cornerstone of the
Python data analysis ecosystem.
Program:
import pandas as pd
import numpy as np
import os
# 1. Creating a Pandas Series from user input
print("1. Creating a Pandas Series:")
n = int(input("Enter number of elements in the Series: "))
series_data = []
series_index = []
for i in range(n):
val = input(f"Enter value {i+1}: ")
idx = input(f"Enter index for value {val}: ")
series_data.append(val)
series_index.append(idx)
data_series = pd.Series(series_data, index=series_index)
print("\nGenerated Series:")
print(data_series, "\n")
# 2. Creating a DataFrame from user input
print("2. Creating a Pandas DataFrame:")
rows = int(input("Enter number of rows: "))
columns = int(input("Enter number of columns: "))
column_names = []
for i in range(columns):
col = input(f"Enter name of column {i+1}: ")
column_names.append(col)
data = []
for i in range(rows):
row_data = []
print(f"Enter data for row {i+1}:")
for col in column_names:
value = input(f" {col}: ")
row_data.append(value)
data.append(row_data)
df = pd.DataFrame(data, columns=column_names)
print("\nGenerated DataFrame:")
print(df, "\n")
# 3. Exporting and Importing to/from CSV
csv_path = r"C:\Users\shruti\OneDrive\Desktop\shruti\DM.csv"
df.to_csv(csv_path, index=False)
print(f"Data exported successfully to:\n{csv_path}")
df_from_csv = pd.read_csv(csv_path)
print("\nData read back from CSV:")
print(df_from_csv, "\n")
# 4. Data Cleaning Example (optional)
print("4. Handling Missing Data (Simulating a missing value):")
df_with_nan = df.copy()
if len(df_with_nan) > 0 and 'Score' in df_with_nan.columns:
df_with_nan.loc[0, 'Score'] = np.nan
print("Before cleaning:\n", df_with_nan)
df_filled = df_with_nan.fillna(df_with_nan['Score'].astype(float).mean())
print("After filling NaN with mean:\n", df_filled)
else:
print("No 'Score' column to demonstrate missing data handling.\n")
# 5. Data Analysis
print("5. Basic Statistics (if numeric columns exist):")
try:
print(df.describe(include='all'), "\n")
except Exception as e:
print(f"Could not compute statistics: {e}")
# 6. Exporting to Excel
excel_path = r"C:\Users\shruti\OneDrive\Desktop\shruti kothari\user_data.xlsx"
df.to_excel(excel_path, index=False)
print(f"Data exported to Excel at:\n{excel_path}")
Output:
Conclusion: Comment on the functional areas where Pandas library is used
The Pandas library is widely used in data analysis, data manipulation, and data visualization
across various domains. It is essential in data science for handling large datasets, performing
operations like filtering, grouping, and aggregation. In finance and business analytics, Pandas
helps in time-series analysis, risk assessment, and performance tracking. It is also extensively
used in machine learning and research, enabling efficient preprocessing and structuring of raw
data for model training and insights.