DEPARTMENT OF INFORMATION COMMUNICATION &
TECHNOLOGY
PRACTICAL FILE
OF
BCA 212P
(INTRODUCTION OF DATA SCIENCE)
Academic session: 2024-25
Batch: 2023-26
Submitted to: Submitted by:
Ms. Mansi Jaiswal Name: Harsh Negi
(Assistant Professor) Enrolment no: 00717002023
Program: BCA
Semester: 4th
Shift: 1st
Division: A
1
INDEX
S.no Experiments Sign
1 Create a pandas series from a dictionary of values and an
ndarray.
2 Create a Series and print all the elements that are above
75th percentile.
3 Perform sorting on Series data and DataFrames
4 Write a program to implement pivot() and pivot-table() on a
DataFrame.
5 Write a program to find mean absolute deviation on a
DataFrame.
6 Two Series object, Population stores the details of four
metro cities of India and another object AvgIncome stores
the total average income reported in four years in these
cities. Calculate income per capita for each of these metro
cities.
7 Create a DataFrame based on E-Commerce data and
generate mean, mode, median.
8 Create a DataFrame based on employee data and generate
quartile and variance.
9 Program to implement Skewness on Random data.
10 Create a DateFrame on any Data and compute statistical
function of Kurtosis.
11 Series objects Temp1, temp2, temp3, temp 4 stores the
temperature of days of week 1, week 2, week 3, week 4.
Write a script to:-
a. Print average temperature per week
b. Print average temperature of entire month
12 Write a Program to read a CSV file and create its
DataFrame.
13 Consider the DataFrame QtrSales where each row contains
the item category, item name and expenditure and group
the rows by category, and print the average expenditure per
category.
14 Create a DataFrame having age, name, weight of five
students. Write a program to display only the weight of first
and fourth rows.
15 Write a program to create a DataFrame to store weight, age
and name of three people. Print the DataFrame and its
transpose.
2
Experiment -1
Create a pandas series from a dictionary of values and an
ndarray.
Code: -
import pandas as pd
import numpy as np
data=np.array([1,2,3,4,5])
Series1=pd.Series(data)
print(Series1)
data_dict={"a":10,"b":20,"c":30}
Series2=pd.Series(data_dict)
print(Series2)
Output: -
3
Experiment-2
Create a Series and print all the elements that are above 75th
percentile.
Code: -
import pandas as pd
import numpy as np
# Create a random Series
np.random.seed(42) # For reproducibility
s = pd.Series(np.random.randint(1, 100, 10)) # 10 random integers between 1 and 100
print("Original Series:\n", s)
# Calculate 75th percentile
percentile_75 = s.quantile(0.75)
print("\n75th Percentile:", percentile_75)
# Filter and print elements above 75th percentile
above_75th = s[s > percentile_75]
print("\nElements above 75th percentile:\n", above_75th)
Output: -
4
Experiment-3
Perform sorting on Series data and DataFrames.
Code: -
import pandas as pd
# Create a Series
my_series = pd.Series([5, 1, 9, 2, 7])
print("Original Series:\n", my_series)
# Sort the Series (smallest to largest)
sorted_series = my_series.sort_values()
print("\nSorted Series:\n", sorted_series)
# Sort Series from largest to smallest
sorted_series_desc = my_series.sort_values(ascending=False)
print("\nSorted Series (Descending):\n", sorted_series_desc)
# --- Sorting DataFrames (Easy) ---
# Create a DataFrame
data = {'Name': ['Charlie', 'Alice', 'Bob'],
'Age': [25, 30, 22]}
my_df = pd.DataFrame(data)
print("\nOriginal DataFrame:\n", my_df)
# Sort the DataFrame by 'Age' (youngest to oldest)
sorted_df = my_df.sort_values(by='Age')
print("\nSorted DataFrame by Age:\n", sorted_df)
5
# Sort the DataFrame by 'Name' (alphabetical order)
sorted_df_name = my_df.sort_values(by='Name')
print("\nSorted DataFrame by Name:\n", sorted_df_name)
# Sort the DataFrame by 'Age' (oldest to youngest)
sorted_df_desc_age = my_df.sort_values(by='Age', ascending=False)
print("\nSorted DataFrame by Age (Descending):\n", sorted_df_desc_age)
Output: -
6
7
Experiment-4
Write a program to implement pivot() and pivot-table() on a
DataFrame.
Code: -
import pandas as pd
# Sample DataFrame
data = {
'Date': ['2023-01-01', '2023-01-01', '2023-01-02', '2023-01-02', '2023-01-03'],
'Category': ['A', 'B', 'A', 'B', 'A'],
'Value': [10, 20, 15, 25, 30]
}
df = pd.DataFrame(data)
# Display the original DataFrame
print("Original DataFrame:")
print(df)
# Using pivot() to reshape the DataFrame
pivot_df = df.pivot(index='Date', columns='Category', values='Value')
print("\nPivoted DataFrame using pivot():")
print(pivot_df)
# Using pivot_table() to reshape the DataFrame
# Here we will use pivot_table to handle potential duplicates by taking the mean
pivot_table_df = df.pivot_table(index='Date', columns='Category', values='Value',
aggfunc='mean')
8
print("\nPivoted DataFrame using pivot_table():")
print(pivot_table_df)
Output: -
9
Experiment-5
Write a program to find mean absolute deviation on a
DataFrame.
Code: -
import pandas as pd
# Sample DataFrame
data = {
'A': [1, 2, 3, 4, 5],
'B': [5, 6, 7, 8, 9],
'C': [10, 11, 12, 13, 14]
}
df = pd.DataFrame(data)
print("Original DataFrame:\n", df)
# Function to calculate Mean Absolute Deviation
def mean_absolute_deviation(df):
# Calculate the mean of each column
mean = df.mean()
# Calculate the absolute deviation from the mean
absolute_deviation = abs(df - mean)
# Calculate the mean of the absolute deviations
mad = absolute_deviation.mean()
return mad
# Calculate Mean Absolute Deviation for the DataFrame
mad_result = mean_absolute_deviation(df)
10
# Display the result
print("Mean Absolute Deviation for each column:")
print(mad_result)
Output: -
11
Experiment-6
Two Series object, Population stores the details of four metro
cities of India and another object AvgIncome stores the total
average income reported in four years in these cities.
Calculate income per capita for each of these metro cities.
Code:-
import pandas as pd
# Example data for Population (in millions)
Population = pd.Series({
'DehraDun': 20.4,
'Almora': 18.9,
'Nanital': 12.3,
})
print("Population of Different cities:")
print(Population,end="\n\n")
# Example data for AvgIncome (in millions)
AvgIncome = pd.Series({
'DehraDun': 150,
'Almora': 120,
'Nanital': 100,
})
print("Average Income of Different cities:")
print(AvgIncome,end="\n\n")
# Calculate income per capita
12
IncomePerCapita = AvgIncome / Population
# Display the result
print("IncomePerCapita of Different Cities:")
print(IncomePerCapita)
Output:-
13
Experiment-7
Create a DataFrame based on E-Commerce data and generate
mean, mode, median.
Code:-
import pandas as pd
# Sample E-Commerce data
data = {
'OrderID': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'Product': ['Laptop', 'Smartphone', 'Tablet', 'Laptop', 'Smartphone',
'Tablet', 'Laptop', 'Smartphone', 'Tablet', 'Laptop'],
'Quantity': [1, 2, 1, 1, 3, 2, 1, 1, 2, 1],
'Price': [1000, 500, 300, 1000, 500, 300, 1000, 500, 300, 1000]
}
# Create DataFrame
ecommerce_df = pd.DataFrame(data)
# Display the DataFrame
print("E-Commerce Dataframe:")
print(ecommerce_df)
print("\n")
# Calculate mean
mean_price = ecommerce_df['Price'].mean()
14
# Calculate mode
mode_price = ecommerce_df['Price'].mode()[0]
# Calculate median
median_price = ecommerce_df['Price'].median()
# Display the results
print(f"Mean Price: {mean_price}")
print(f"Mode Price: {mode_price}")
print(f"Median Price: {median_price}")
Output:-
15
Experiment-8
Create a DataFrame based on employee data and generate
quartile and variance.
Code:-
import pandas as pd
# Sample employee data
data = {
'EmployeeID': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'Name': ['Krishna', 'Murali', 'Chaitanya', 'Shyam', 'Govind',
'Madhav', 'Gopal', 'Gopal', 'Murari', 'Keshava'],
'Age': [25, 30, 35, 40, 28, 32, 45, 50, 29, 38],
'Salary': [50000, 60000, 70000, 80000, 55000,
62000, 75000, 90000, 58000, 72000],
'YearsAtCompany': [1, 2, 3, 4, 1, 2, 5, 6, 2, 3]
}
# Create DataFrame
employee_df = pd.DataFrame(data)
# Display the DataFrame
print("Employees Data:")
print(employee_df)
# Calculate quartiles
quartiles_salary = employee_df['Salary'].quantile([0.25, 0.5, 0.75])
quartiles_years = employee_df['YearsAtCompany'].quantile([0.25, 0.5, 0.75])
# Calculate variance
16
variance_salary = employee_df['Salary'].var()
variance_years = employee_df['YearsAtCompany'].var()
# Display the results
print("\nQuartiles for Salary:")
print(quartiles_salary)
print("\nQuartiles for Years at Company:")
print(quartiles_years)
print(f"\nVariance for Salary: {variance_salary}")
print(f"Variance for Years at Company: {variance_years}")
Output: -
17
Experiment-9
Program to implement Skewness on Random data.
Code: -
# Program to implement Skewness on Random data.
import numpy as np
from scipy.stats import skew
# Generate random data
data = data = np.random.normal(1, 100, 15)
print("Random Numbers:")
print(data)
# Calculate skewness
data_skewness = skew(data)
# Print the skewness
print(f"\nSkewness of the data: {data_skewness}")
Output: -
18
Experiment-10
Create a DateFrame on any Data and compute statistical
function of Kurtosis.
Code: -
import pandas as pd
from scipy.stats import kurtosis
# Step 1: Create a sample DataFrame
data = {
'EmployeeID': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'Name':['Krishna', 'Murali', 'Chaitanya', 'Shyam', 'Govind',
'Madhav', 'Gopal', 'Gopal', 'Murari', 'Keshava'],
'Age': [25, 30, 35, 40, 28, 32, 45, 50, 29, 38],
'Salary': [50000, 60000, 70000, 80000, 55000,
62000, 75000, 90000, 58000, 72000],
'YearsAtCompany': [1, 2, 3, 4, 1, 2, 5, 6, 2, 3]
}
# Create DataFrame
employee_df = pd.DataFrame(data)
# Display the DataFrame
print("Employee DataFrame:")
print(employee_df)
# Step 2: Compute kurtosis for the 'Salary' column
kurtosis_salary = kurtosis(employee_df['Salary'], fisher=True) # Fisher's definition (subtracts
3)
19
# Display the kurtosis result
print(f"\nKurtosis of Salary: {kurtosis_salary}")
Output: -
20
Experiment-11
Series objects Temp1, temp2, temp3, temp 4 stores the
temperature of days of week 1, week 2, week 3, week 4.
Write a script to:-
a. Print average temperature per week
b. Print average temperature of entire month
Code: -
import pandas as pd
# Sample temperature data for four weeks (7 days each)
data = {
'Week 1': [30, 32, 31, 29, 28, 30, 31], # Week 1
'Week 2': [31, 30, 29, 32, 33, 31, 30], # Week 2
'Week 3': [28, 29, 30, 31, 32, 30, 29], # Week 3
'Week 4': [30, 31, 32, 33, 34, 30, 31] # Week 4
}
# Create DataFrame
temperature_df = pd.DataFrame(data)
# Display the DataFrame
print("Temperature DataFrame:")
print(temperature_df)
# a. Print average temperature per week
avg_temp_per_week = temperature_df.mean()
print("\nAverage temperature per week:")
21
print(avg_temp_per_week)
# b. Print average temperature of entire month
avg_temp_month = temperature_df.values.flatten().mean()
print(f"\nAverage temperature for the entire month: {avg_temp_month:.2f}°C")
Output: -
22
Experiment-12
Write a Program to read a CSV file and create its DataFrame.
Code: -
CSV File
EmployeeID,Name,Age,Salary
1,Shyam,30,50000
2,Gopal,25,60000
3,Madhav,35,70000
4,keshava,40,80000
5,Murari,28,55000
Python File
import pandas as pd
# Step 1: Read the CSV file
file_path = 'L12.csv' # Make sure this path is correct
employee_df = pd.read_csv(file_path)
# Step 2: Display the DataFrame
print("Employee DataFrame:")
print(employee_df)
# Optional: Display basic information about the DataFrame
print("\nBasic Information about the DataFrame:")
print(employee_df.info())
# Optional: Display the first few rows of the DataFrame
print("\nFirst few rows of the DataFrame:")
print(employee_df.head())
23
Output: -
24
Experiment-13
Consider the DataFrame QtrSales where each row contains
the item category, item name and expenditure and group the
rows by category, and print the average expenditure per
category.
Code: -
import pandas as pd
# Sample data for QtrSales DataFrame
data = {
'Category': ['Electronics', 'Electronics', 'Clothing', 'Clothing', 'Groceries',
'Groceries'],
'Item': ['Laptop', 'Smartphone', 'T-shirt', 'Jeans', 'Milk', 'Bread'],
'Expenditure': [1200, 800, 50, 60, 30, 20]
}
# Create DataFrame
QtrSales = pd.DataFrame(data)
# Display the DataFrame
print("QtrSales DataFrame:")
print(QtrSales)
# Group by 'Category' and calculate the average expenditure
average_expenditure = QtrSales.groupby('Category')['Expenditure'].mean()
25
# Display the average expenditure per category
print("\nAverage Expenditure per Category:")
print(average_expenditure)
Output: -
26
Experiment-14
Create a DataFrame having age, name, weight of five
students. Write a program to display only the weight of first
and fourth rows.
Code: -
import pandas as pd
# Sample data for five students
data = {
'Name': ['Madhav', 'Shyam', 'Murari', 'Gopal', 'Keshava'],
'Age': [20, 21, 19, 22, 20],
'Weight': [55, 70, 60, 80, 65] # Weight in kg
}
# Create DataFrame
students_df = pd.DataFrame(data)
# Display the DataFrame
print("Students DataFrame:")
print(students_df)
# Display the weight of the first and fourth rows
weights = students_df.iloc[[0, 3]]['Weight']
print("\nWeight of the first and fourth students:")
print(weights)
27
Output: -
28
Experiment-15
Write a program to create a DataFrame to store weight, age
and name of three people. Print the DataFrame and its
transpose.
Code: -
import pandas as pd
# Sample data for three people
data = {
'Name': ['Keshava', 'Madhav', 'Murari'],
'Age': [25, 30, 35],
'Weight': [55, 70, 80] # Weight in kg
}
# Create DataFrame
people_df = pd.DataFrame(data)
# Display the DataFrame
print("DataFrame:")
print(people_df)
# Print the transpose of the DataFrame
print("\nTranspose of the DataFrame:")
print(people_df.T)
29
Output: -
30