0% found this document useful (0 votes)

8 views72 pages

Programs MLT Lab Print

The document outlines several exercises related to data analysis and machine learning using Python, including calculating average salary, data processing methods, exploratory data analysis (EDA), linear regression, decision trees, and K-means clustering. Each exercise includes code snippets demonstrating the implementation of various data manipulation and analysis techniques using libraries like pandas, numpy, and scikit-learn. The document serves as a practical guide for students at K. Ramakrishnan College of Engineering to apply data science concepts.

Uploaded by

kumarnithishhello

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views72 pages

Programs MLT Lab Print

Uploaded by

kumarnithishhello

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 72

K.

Ramakrishnan College of Engineering (Autonomous), Trichy

EX.NO. 1 CALCULATION OF AVERAGE SALARY USING DATAFRAME

PROGRAM

import pandas as pd

import numpy as np

# Create the DataFrame with random integers

df = pd.DataFrame({

'Age': np.random.randint(18, 61, size=10), # 61 is exclusive

'Salary': np.random.randint(30000, 100001, size=10)})

# Display the DataFrame

print("Generated DataFrame:")

print(df)

# Calculate and display the average salary

average_salary = df['Salary'].mean()

print(f"\nAverage Salary: ₹{average_salary:,.2f}")

OUTPUT

Department of Artificial Intelligence and Data Science Page

K. Ramakrishnan College of Engineering (Autonomous), Trichy

EX.NO. 2 IMPLEMENTATION OF DATA PROCESSING METHODS

PROGRAM

import pandas as pd

import numpy as np

# Sample data with some null values

data = {

'Name': ['Alice', 'Bob', 'Charlie', 'David', None],

'Age': [25, 30, None, 22, 29],

'Salary': [50000, None, 62000, 58000, 60000]

# Create DataFrame

df = pd.DataFrame(data)

print("Original DataFrame:\n", df)

# 1. Dropping rows with any null value

df_dropped = df.dropna()

print("\nDataFrame after dropping rows with null values:\n", df_dropped)

# 2. Filling null values

df_filled = df.fillna({

'Name': 'Unknown',

'Age': df['Age'].mean(), # fill Age with average

'Salary': df['Salary'].median() # fill Salary with median

})

Department of Artificial Intelligence and Data Science Page

K. Ramakrishnan College of Engineering (Autonomous), Trichy

print("\nDataFrame after filling null values:\n", df_filled)

# 3. Selecting specific data

selected_rows = df_filled[df_filled['Salary'] > 55000]

print("\nSelected rows where Salary > 55000:\n", selected_rows)

# 4. Convert 'Name' column into a list

name_list = df_filled['Name'].tolist()

print("\n'Name' column as list:\n", name_list)

OUTPUT
Original DataFrame:
Name Age Salary
0 Alice 25.0 50000.0
1 Bob 30.0 NaN
2 Charlie NaN 62000.0
3 David 22.0 58000.0
4 None 29.0 60000.0

Data Frame after dropping rows with null values:

Name Age Salary

0 Alice 25.0 50000.0
3 David 22.0 58000.0

Data Frame after filling null values:

Name Age Salary
0 Alice 25.000000 50000.0
1 Bob 30.000000 58000.0
2 Charlie 26.5 62000.0

Department of Artificial Intelligence and Data Science Page

K. Ramakrishnan College of Engineering (Autonomous), Trichy

3 David 22.000000 58000.0

4 Unknown 29.000000 60000.0

Selected rows where Salary > 55000:

Name Age Salary
1 Bob 30.0 58000.0
2 Charlie 26.5 62000.0
3 David 22.0 58000.0
4 Unknown 29.0 60000.0

'Name' column as list:

['Alice', 'Bob', 'Charlie', 'David', 'Unknown']

Department of Artificial Intelligence and Data Science Page

K. Ramakrishnan College of Engineering (Autonomous), Trichy

EX.NO:3 IMPLEMENTATION OF EDA

PROGRAM

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# Set seed for reproducibility
np.random.seed(42)
# Create synthetic dataset

df = pd.DataFrame({
'Age': np.random.randint(22, 60, size=100),
'Salary': np.random.randint(30000, 120000, size=100),
'Experience_Years': np.random.randint(0, 35, size=100),
'Performance_Score': np.random.normal(loc=6, scale=1.5, size=100).round(1),
'Training_Hours': np.random.randint(0, 100, size=100)
})
# Inject some missing values
df.loc[np.random.choice(df.index, size=5), 'Performance_Score'] = np.nan

# 🖥 Display first few rows

print(" Sample of the dataset:")

print(df.head())

Department of Artificial Intelligence and Data Science Page

K. Ramakrishnan College of Engineering (Autonomous), Trichy

# Summary statistics print("\n

Summary Statistics:")
print(df.describe())
# Check for missing data
print("\n Missing Values:")
print(df.isnull().sum())
# Histograms of numeric features
df.hist(figsize=(12, 8), edgecolor='black')
plt.suptitle('Histogram of Numeric Features', fontsize=16)
plt.tight_layout()
plt.show()
Boxplots for all numeric features

plt.figure(figsize=(12, 6))
sns.boxplot(data=df,palette='Set3')
plt.title('Boxplot of Numeric Features', fontsize=16)
plt.xticks(rotation=45)
plt.show()

Department of Artificial Intelligence and Data Science Page

K. Ramakrishnan College of Engineering (Autonomous), Trichy

OUTPUT

Department of Artificial Intelligence and Data Science Page

K. Ramakrishnan College of Engineering (Autonomous), Trichy

EX. NO:4 IMPLEMENTATION OF SINGLE LINEAR REGRESSION

PROGRAM

import pandas as pd

import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LinearRegression

from sklearn.metrics import mean_squared_error, r2_score
# Step 1: Load dataset

df = pd.read_csv('placement.csv')

# Step 2: Display first few rows to understand structure print("First 5 rows of the
dataset:")

print(df.head())

# Step 3: Select input and output variables

# For example, assume "cgpa" as independent (X) and "package" as dependent (y)

X = df[['placement_exam_marks']]
y = df['placed']

# Step 4: Split the data into training and testing sets (80% train, 20% test)
X_train,X_test,y_train,y_test=train_test_split(X,y, test_size=0.2, random_state=42)

# Step 5: Train the Linear Regression model

model = LinearRegression()
model.fit(X_train, y_train)

# Step 6: Make predictions

y_pred = model.predict(X_test)

Department of Artificial Intelligence and Data Science Page

K. Ramakrishnan College of Engineering (Autonomous), Trichy

# Step 7: Evaluate the model

print("\nModel Performance:")

print("Mean Squared Error:",mean_squared_error(y_test,

y_pred))
print("R^2 Score:", r2_score(y_test, y_pred))

# Step 8: Visualize the results

plt.scatter(X_test, y_test, color='blue', label='Actual') plt.plot(X_test, y_pred,

color='red', linewidth=1, label='Predicted')
plt.xlabel('Placement Exam Marks')

plt.ylabel('Placed')
plt.title('Simple Linear Regression')
plt.legend()

plt.grid(True)

plt.show()

OUTPUT
First 5 rows of the dataset:
cgpa placement_exam_marks placed
0 7.19 26.0 1
1 7.46 38.0 1
2 7.54 40.0 1
3 6.42 8.0 1
4 7.23 17.0 0

Model Performance:
Mean Squared Error: 0.2501852253844579
R^2 Score: -0.005668678060327448

Department of Artificial Intelligence and Data Science Page

K. Ramakrishnan College of Engineering (Autonomous), Trichy

Department of Artificial Intelligence and Data Science Page

K. Ramakrishnan College of Engineering (Autonomous), Trichy

EX.NO:5 IMPLEMENTATION OF DECISION TREE

PROGRAM

import pandas as pd
df1=pd.read_csv("Symptom-severity.csv")
df2=pd.read_csv("dataset.csv")
df3=pd.read_csv("symptom_Description.csv")
df4=pd.read_csv(“symptom_precaution.csv")
df1.head()

df2.head()

OUTPUT

Department of Artificial Intelligence and Data Science Page

K. Ramakrishnan College of Engineering (Autonomous), Trichy

df3.head()

df4.head()

res = pd.concat([df1,df2,df3,df4])

res

Department of Artificial Intelligence and Data Science Page

K. Ramakrishnan College of Engineering (Autonomous), Trichy

res.info()

res = res.fillna(res.median(numeric_only=True))

for col in res.select_dtypes(include='object'):

mode_val = res[col].mode()
if not mode_val.empty:
res[col] = res[col].fillna(mode_val[0])

res

Department of Artificial Intelligence and Data Science Page

K. Ramakrishnan College of Engineering (Autonomous), Trichy

res.describe()

res.info()

Department of Artificial Intelligence and Data Science Page

K. Ramakrishnan College of Engineering (Autonomous), Trichy

res['Precaution_Merged']=res[['Precaution_1','Precaution_2','Precaution_3',
'Precaution_4']].astype(str).agg(', '.join, axis=1)
res['Symptom_Merged']=res[['Symptom','Symptom_1','Symptom_2','Symptom_3,'Symptom
_4',
'Symptom_5','Symptom_6','Symptom_7','Symptom_8','Symptom_9',
'Symptom_10','Symptom_11','Symptom_12','Symptom_13','Symptom_14','Symptom_15',
'Symptom_17']].astype(str).agg(', '.join, axis=1)
res.head()

input=res[['weight','Description','Precaution_Merged','Symptom_Merged']]

target=res['Disease']
input.head()

Department of Artificial Intelligence and Data Science Page

K. Ramakrishnan College of Engineering (Autonomous), Trichy

target.head()

from sklearn.preprocessing import LabelEncoder

le_Description=LabelEncoder()
le_Precaution_Merged=LabelEncoder()
le_Symptom_Merged=LabelEncoder()
input['Description_n']=le_Description.fit_transform(input['Description'])
input['Precaution_Merged_n']=le_Precaution_Merged.fit_transform(input['Precaution_
Merged'])
input['Symptom_Merged_n']=le_Symptom_Merged.fit_transform(input['Symptom_
Merged'])

input.head()

Department of Artificial Intelligence and Data Science Page

K. Ramakrishnan College of Engineering (Autonomous), Trichy

input=input.drop(['Description','Precaution_Merged','Symptom_Merged'],
axis='column')
input.head()

le_Disease=LabelEncoder()
target=le_Disease.fit_transform(target)
target

from sklearn.model_selection import train_test_split

x_train,x_test,y_train,y_test=train_test_split(input,target,test_size=0.33,random_state=42)
from sklearn import tree
model=tree.DecisionTreeClassifier(max_depth=3)
model.fit(x_train,y_train)
import matplotlib.pyplot as plt

from sklearn.tree import DecisionTreeClassifier, plot_tree

import numpy as np
plt.figure(figsize=(20, 10))

Department of Artificial Intelligence and Data Science Page

K. Ramakrishnan College of Engineering (Autonomous), Trichy

plot_tree(

model,

feature_names=input.columns,

class_names=[str(c) for c in sorted(np.unique(target))],

filled=True,

rounded=True,

fontsize=10, max_depth=3)

plt.title('Decision Tree')
plt.show()
OUTPUT

Department of Artificial Intelligence and Data Science Page

K. Ramakrishnan College of Engineering (Autonomous), Trichy

EX.NO:6 IMPLEMENTATION OF K MEANS ALGORITHM

PROGRAM

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

import warnings
warnings.filterwarnings('ignore')
df = pd.read_csv('Instagram visits clustering.csv')
df

plt.scatter(df[['Instagram visit score']],df['Spending_rank(0 to 100)'])

Department of Artificial Intelligence and Data Science Page

K. Ramakrishnan College of Engineering (Autonomous), Trichy

OUTPUT

#Building a K-means Clustering model

from sklearn.cluster import KMeans
wcss = []
for i in range(1,11):

km = KMeans(n_clusters=i)

km.fit_predict(df)

wcss.append(km.inertia_)

wcss

Department of Artificial Intelligence and Data Science Page

K. Ramakrishnan College of Engineering (Autonomous), Trichy

plt.plot(range(1,11),wcss)

X = df.iloc[:,:].values #numpy array

km = KMeans(n_clusters=4)
y_means = km.fit_predict(X) y_means

Department of Artificial Intelligence and Data Science Page

K. Ramakrishnan College of Engineering (Autonomous), Trichy

X[y_means==0]

X[y_means==1]

X[y_means==2]

X[y_means==3]

X[y_means == 3,1]

Department of Artificial Intelligence and Data Science Page

K. Ramakrishnan College of Engineering (Autonomous), Trichy

array([ 70., 79., 28., 49., 92., 90., 80., 84., 76., 90., 36.,
84., 84., 49., 96., 43., 71., 10., 102., 78., 86., 50.,
71., 82., 18., 88., 74., 40., 92., 22., 118., 84., 27.,
78., 84., 86., 82., 72., 59., 66., 54., 79., 103., 51.,
45., 67., 93., 47., 74., 46., 90., 40., 11., 54., 84.,
99., 43., 20., 92., 22., 87., 53., 25., 69., 92., 86.,
32., 67., 56., 47., 88., 89., 20., 54., 86., 78., 9.,
36., 80., 67., 87., 28., 50., 88., 36., 71., 90., 67.,
80., 85., 95., 70., 80., 77., 34., 89., 81., 81., 96.,

71., 91., 46., 22., 86., 73., 24., 75., 15., 71., 55.,
38., 85., 62., 91., 23., 58., 32., 82., 23., 81., 86.,
97., 70., 100., 78., 64., 30., 21., 86., 86., 94., 56.,
39., 24., 84., 77., 90., 52., 79., 86., 24., 33., 77.,
87., 61., 90., 67., 26., 48., 27., 61., 89., 14., 53.,
97., 36., 83., 92., 83., 84., 93., 24., 28., 14., 90.,

78., 62., 73., 75., 72., 78., 80., 33., 91., 32., 91.,
90., 21., 70., 80., 66., 64., 89., 85., 25., 88., 14.,
64., 87., 26., 29., 70., 42., 77., 53., 28., 83., 101.,
92., 100., 10., 58., 88., 36., 38., 95., 36., 27., 97.,
100., 40., 88., 87., 85., 91., 41., 28., 96., 79., 89.,
78., 17., 70., 28., 91., 54., 66., 102., 80., 76., 16.,
28., 24., 74., 41., 41., 49., 44., 18., 74., 88., 77.,
82., 82., 43., 25., 47., 64., 81., 78., 22., 81., 38.,
61., 89., 77., 30., 45., 30., 22., 51., 13., 65., 85.,

Department of Artificial Intelligence and Data Science Page

K. Ramakrishnan College of Engineering (Autonomous), Trichy

84., 59., 84., 92., 21., 101., 18., 73., 35., 26., 24.,
37., 61., 90., 33., 32., 89., 98., 87., 83., 67., 56.,
51., 17., 40., 86., 86., 75., 78., 91., 69., 91., 33.,
28., 99., 26., 59., 22., 59., 52., 36., 89., 92., 64.,
33., 27., 82., 79., 47., 42., 82., 24., 90., 24., 28.,
97., 28., 22., 34., 18., 29., 78., 63., 70., 51., 66.,
87., 69., 80., 42., 70., 78., 68., 85., 54., 80., 48.,
39., 85., 31., 76., 30., 56., 93., 90., 104., 58., 74.,
32., 81., 83., 93., 15., 43., 91., 82., 22., 96., 64.,

18., 9., 37., 79., 86., 81., 95., 93., 59., 66., 40.,
58., 24., 41., 97., 79., 102., 73., 45., 78., 31., 61.,
61., 86., 78., 72., 82., 31., 87., 92., 56., 63., 38.,
88., 79., 48., 45., 92., 77., 48., 12., 86., 38., 76.,
22., 44., 94., 87., 85., 77., 90., 32., 52., 89., 65.,
31., 21., 92., 69., 98., 68., 90., 60., 86., 41., 89.,
83., 83., 43., 19., 13., 28., 80., 82., 7., 83., 41.,
81., 91., 94., 39., 53., 88., 102., 19., 79., 85., 94.,
34., 61., 80., 20., 60., 90., 60., 91., 91., 91., 73.,
92., 89., 89., 101., 87., 62., 69., 30., 114., 72., 59.,
45., 73., 38., 92., 95., 85., 80., 48., 73., 88., 28.,
26., 72., 104., 92., 86., 23., 88., 40., 88., 75., 97.,
88., 59., 90., 92., 35., 88., 82., 40., 78., 77., 87.,
77., 28., 89., 25., 99., 85., 24., 27., 42., 88., 85.,

Department of Artificial Intelligence and Data Science Page

K. Ramakrishnan College of Engineering (Autonomous), Trichy

81., 81., 72., 20., 83., 89., 44., 18., 88., 48., 65.,
90., 64., 76., 45., 69., 37., 83., 80., 87., 77., 15.,
31., 35., 37., 72., 46., 83., 90., 48., 86., 33., 90.,
26., 52., 86., 83., 75., 85., 92., 93., 81., 83., 90.,
27., 26., 82., 94., 30., 62., 20., 57., 75., 80., 33.,
104., 40., 88., 80., 5., 27., 27., 74., 61., 42., 64.,
27., 56., 82., 79., 96., 91., 72., 60., 40., 93., 85.,
98., 32., 90., 83., 91., 41., 19., 20., 43., 27., 79.,
86., 83., 31., 47., 84., 94., 84., 94., 82., 84., 37.,
66., 41., 86., 99., 40., 76., 102., 76., 90., 87., 77.,
37., 88., 76., 36., 28., 99., 96., 34., 91., 82., 84.,
27., 61., 95., 29., 38., 96., 70., 95.])

plt.scatter(X[y_means == 0,0],X[y_means == 0,1],color='blue')

plt.scatter(X[y_means == 1,0],X[y_means == 1,1],color='red')
plt.scatter(X[y_means == 2,0],X[y_means == 2,1],color='green')
plt.scatter(X[y_means == 3,0],X[y_means == 3,1],color='yellow')

Department of Artificial Intelligence and Data Science Page

K. Ramakrishnan College of Engineering (Autonomous), Trichy

from sklearn.datasets import make_blobs # k_means on 3D

centroids = [(-5,-5,5),(5,5,-5),(3.5,-2.5,4),(-2.5,2.5,-4)]
cluster_std = [1,1,1,1]
X,y=make_blobs(n_samples=200,cluster_std=cluster_std,centers=centroids,

n_features= 3,random_state=1)
X
array([[ 4.33424548, 3.32580419, 4.17497018],
[-3.32246719, 3.22171129, -4.625342 ],
[-6.07296862, -4.13459237, 2.6984613 ],
[ 6.90465871, 6.1110567 , -4.3409502 ],
[-2.60839207, 2.95015551, -2.2346649 ],
[ 5.88490881, 4.12271848, -5.86778722],
[-4.68484061, -4.15383935, 4.14048406],
[-1.82542929, 3.96089238, -3.4075272 ],

Department of Artificial Intelligence and Data Science Page

K. Ramakrishnan College of Engineering (Autonomous), Trichy

[-5.34385368, -4.95640314, 4.37999916],

[ 4.91549197, 4.70263812, -4.582698 ],
[-3.80108212, -4.81484358, 4.62471505],
[ 4.6735005 , 3.65732421, -3.88561702],
[-6.23005814, -4.4494625 , 5.79280687],
[-3.90232915, 2.95112294, -4.6949209 ],
[ 3.72744124, 5.31354772, -4.49681519],
[-3.3088472 , 3.05743945, -3.81896126],
[ 2.70273021, -2.21732429, 3.17390257],
[ 4.06438286, -0.36217193, 3.214466 ],
[ 4.69268607, -2.73794194, 5.15528789],
[ 4.1210827 , -1.5438783 , 3.29415949],
[-6.61577235, -3.87858229, 5.40890054],
[ 3.05777072, -2.17647265, 3.89000851],
[-1.48617753, 0.27288737, -5.6993336 ],
[-5.3224172 , -5.38405435, 6.13376944],
[-5.26621851, -4.96738545, 3.62688268],
[ 5.20183018, 5.66102029, -3.20784179],
[-2.9189379 , 2.02081508, -5.95210529],
[ 3.30977897, -2.94873803, 3.32755196],
[ 5.12910158, 6.6169496 , -4.49725912],
[-2.46505641, 3.95391758, -3.33831892],
[ 1.46279877, -4.44258918, 1.49355935],
[ 3.87798127, 4.48290554, -5.99702683],
[ 4.10944442, 3.8808846 , -3.0439211 ],

Department of Artificial Intelligence and Data Science Page

K. Ramakrishnan College of Engineering (Autonomous), Trichy

[-6.09989127, -5.17242821, 4.12214158],

[-3.03223402, 3.6181334 , -3.3256039 ],
[ 7.44936865, 4.45422583, -5.19883786],
[-4.47053468, -4.86229879, 5.07782113],
[-1.46701622, 2.27758597, -2.52983966],
[ 3.0208429 , -2.14983284, 4.01716473],
[ 3.82427424, -2.47813716, 3.53132618],
[-5.74715829, -3.3075454 , 5.05080775],
[-1.51364782, 2.03384514, -2.61500866],
[-4.80170028, -4.88099135, 4.32933771],
[ 6.55880554, 5.1094027 , -6.2197444 ],
[-1.48879294, 1.02343734, -4.14319575],
[ 4.30884436, -0.71024532, 4.45128402],
[ 3.58646441, -4.64246673, 3.16983114],
[ 3.37256166, 5.60231928, -4.5797178 ],
[-1.39282455, 3.94287693, -4.53968156],
[-4.64945402, -6.31228341, 4.96130449],
[ 3.88352998, 5.0809271 , -5.18657899],
[ 3.32454103, -3.43391466, 3.46697967],
[ 3.45029742, -2.03335673, 5.03368687],
[-2.95994283, 3.14435367, -3.62832971],
[-3.03289825, -6.85798186, 6.23616403],
[-4.13665468, -5.1809203 , 4.39607937],
[-3.6134361 , 2.43258998, -2.83856002],
[ 2.07344458, -0.73204005, 3.52462712],

Department of Artificial Intelligence and Data Science Page

K. Ramakrishnan College of Engineering (Autonomous), Trichy

[ 4.11798553, -2.68417633, 3.88401481],

[ 3.60337958, 4.13868364, -4.32528847],
[-5.84520564, -5.67124613, 4.9873354 ],
[-2.41031359, 1.8988432 , -3.44392649],
[-2.75898285, 2.6892932 , -4.56378873],
[-2.442879 , 1.70045251, -4.2915946 ],
[ 3.9611641 , -3.67598267, 5.01012718],
[-7.02220122, -5.30620401, 5.82797464],
[ 2.90019547, -1.37658784, 4.30526704],
[ 5.81095167, 6.04444209, -5.40087819],
[-5.75439794, -3.74713184, 5.51292982],
[-2.77584606, 3.72895559, -2.69029409],
[ 3.07085772, -1.29154367, 5.1157018 ],
[ 2.206915 , 6.93752881, -4.63366799],
[ 4.2996015 , 4.79660555, -4.75733056],
[ 4.86355526, 4.88094581, -4.98259059],
[-4.38161974, -4.76750544, 5.68255141],
[ 5.42952614, 4.3930016 , -4.89377728],
[ 3.69427308, 4.65501279, -5.23083974],
[ 5.90148689, 7.52832571, -5.24863478],
[-4.87984105, -4.38279689, 5.30017032],
[ 3.93816635, -1.37767168, 3.0029802 ],
[-3.32862798, 3.02887975, -6.23708651],
[-4.76990526, -4.23798882, 4.77767186],
[-2.12754315, 2.3515102 , -4.1834002 ],

Department of Artificial Intelligence and Data Science Page

K. Ramakrishnan College of Engineering (Autonomous), Trichy

[-0.64699051, 2.64225137, -3.48649452],

[-5.63699565, -4.80908452, 7.10025514],
[-1.86341659, 3.90925339, -2.37908771],
[ 4.82529684, 5.98633519, -4.7864661 ],
[-5.24937038, -3.53789206, 2.93985929],
[-4.59650836, -4.40642148, 3.90508815],
[-3.66400797, 3.19336623, -4.75806733],
[ 6.29322588, 4.88955297, -5.61736206],
[-2.85340998, 0.71208711, -3.63815268],
[-2.35835946, -0.01630386, -4.59566788],
[ 5.61060505, -3.80653407, 4.07638048],
[-1.78695095, 3.80620607, -4.60460297],
[-6.11731035, -4.7655843 , 6.65980218],
[-5.63873041, -4.57650565, 5.07734007],
[ 5.62336218, 4.56504332, -3.59246 ],
[-3.37234925, -4.6619883 , 3.80073197],
[-5.69166075, -5.39675353, 4.3128273 ],
[ 7.19069973, 3.10363908, -5.64691669],
[-3.86837061, -3.48018318, 7.18557541],
[-4.62243621, -4.87817873, 6.12948391],
[ 5.21112476, 5.01652757, -4.82281228],
[-2.61877117, 2.30100182, -2.13352862],
[-2.92449279, 1.76846902, -5.56573815],
[-2.80912132, 3.01093777, -2.28933816],
[ 4.35328122, -2.91302931, 5.83471763],

Department of Artificial Intelligence and Data Science Page

K. Ramakrishnan College of Engineering (Autonomous), Trichy

[ 2.79865557, -3.03722302, 4.15626385],

[-3.65498263, 2.3223678 , -5.51045638],
[ 4.8887794 , -3.16134424, 7.03085711],
[ 4.94317552, 5.49233656, -5.68067814],
[ 3.97761018, -3.52188594, 4.79452824],
[-3.41844004, 2.39465529, -3.36980433],
[ 3.50854895, -2.66819884, 3.82581966],
[-2.63971173, 3.88631426, -3.45187042],
[-3.37565464, -5.61175641, 4.47182825],
[-2.37162301, 4.26041518, -3.03346075],
[ 1.81594001, -3.6601701 , 5.35010682],
[ 5.04366899, 4.77368576, -3.66854289],
[-4.19813897, -4.9534327 , 4.81343023],
[ 5.1340482 , 6.20205486, -4.71525189],
[ 3.39320601, -1.04857074, 3.38196315],
[ 4.34086156, -2.60288722, 5.14690038],
[-0.80619089, 2.69686978, -3.83013074],
[-5.62353073, -4.47942366, 3.85565861],
[ 5.56578332, -3.97115693, 3.1698281 ],
[ 4.41347606, 3.76314662, -4.12416107],
[ 4.01507361, -5.28253447, 4.58464661],
[-5.02461696, -5.77516162, 6.27375593],
[ 5.55635552, -0.73975077, 3.93934751],
[-5.20075807, -4.81343861, 5.41005165],
[-2.52752939, 4.24643509, -4.77507029],

Department of Artificial Intelligence and Data Science Page

K. Ramakrishnan College of Engineering (Autonomous), Trichy

[-3.85527629, -4.09840928, 5.50249434],

[ 5.78477065, 4.04457474, -4.41408957],
[ 1.74407436, -1.7852104 , 4.85270406],
[ 3.27123417, -0.88663863, 3.62519531],
[ 7.18697965, 5.44136444, -5.10015523],
[-2.78899734, 2.10818376, -3.31599867],
[-3.37000822, 2.86919047, -3.14671781],
[-4.30196797, -5.44712856, 6.2245077 ],
[ 3.95541062, 7.05117344, -4.414338 ],
[ 3.55912398, 6.23225307, -5.25417987],
[-3.09384307, 2.15609929, -5.00016919],
[-5.93576943, -5.26788808, 5.53035547],
[ 5.83600472, 6.54335911, -4.24119434],
[ 4.68988323, 2.56516224, -3.9611754 ],
[-5.29809284, -4.51148185, 4.92442829],
[-1.30216916, 4.20459417, -2.95991085],
[ 4.9268873 , 6.16033857, -4.63050728],
[-3.30618482, 2.24832579, -3.61728483],
[ 4.50178644, 4.68901502, -5.00189148],
[ 3.86723181, -1.26710081, 3.57714304],
[ 4.32458463, -1.84541985, 3.94881155],
[ 4.87953543, 3.76687926, -6.18231813],
[ 3.51335268, -3.1946936 , 4.6218035 ],
[-4.83061757, -4.25944355, 4.0462994 ],
[-1.6290302 , 1.99154287, -3.22258079],

Department of Artificial Intelligence and Data Science Page

K. Ramakrishnan College of Engineering (Autonomous), Trichy

[ 1.62683902, -1.57938488, 3.96463208],

[ 6.39984394, 4.21808832, -5.43750898],
[ 5.82400562, 4.43769457, -3.04512192],
[-3.25518824, -5.7612069 , 5.3190391 ],
[-4.95778625, -4.41718479, 3.89938082],
[ 2.75003038, -0.4453759 , 4.05340954],
[ 3.85249436, -2.73643695, 4.7278135 ],
[-5.10174587, -4.13111384, 5.75041164],
[-4.83996293, -4.12383108, 5.31563495],
[ 1.086497 , -4.27756638, 3.22214117],
[ 4.61584111, -2.18972771, 1.90575218],
[-4.25795584, -5.19183555, 4.11237104],
[ 5.09542509, 5.92145007, -4.9392498 ],
[-6.39649634, -6.44411381, 4.49553414],
[ 5.26246745, 5.2764993 , -5.7332716 ],
[ 3.5353601 , -4.03879325, 3.55210482],
[ 5.24879916, 4.70335885, -4.50478868],
[ 5.61853913, 4.55682807, -3.18946509],
[-2.39265671, 1.10118718, -3.91823218],
[ 3.16871683, -2.11346085, 3.14854434],
[ 3.95161595, -1.39582567, 3.71826373],
[-4.09914405, -5.68372786, 4.87710977],
[-1.9845862 , 1.38512895, -4.76730983],
[-1.45500559, 3.1085147 , -4.0693287 ],
[ 2.94250528, -1.56083126, 2.05667659],

Department of Artificial Intelligence and Data Science Page

K. Ramakrishnan College of Engineering (Autonomous), Trichy

[ 2.77440288, -3.36776868, 3.86402267],

[ 4.50088142, -2.88483225, 5.45810824],
[-5.35224985, -6.1425182 , 4.65065728],
[-2.9148469 , 2.95194604, -5.57915629],
[-4.06889792, -4.71441267, 5.88514116],
[ 3.47431968, 5.79502609, -5.37443832],
[ 3.66804833, 3.23931144, -6.65072127],
[-3.22239191, 3.59899633, -4.90163449],
[-3.6077125 , 2.48228168, -5.71939447],
[ 5.5627611 , 5.24073709, -4.71933492],
[ 1.38583608, -2.91163916, 5.27852808],
[ 4.42001793, -2.69505734, 4.80539342],
[ 4.71269214, 5.68006984, -5.3198016 ],
[-4.13744959, 6.4586027 , -3.35135636],
[-5.20889423, -4.41337681, 5.83898341],
[ 2.6194224 , -2.77909772, 5.62284909],
[-1.3989998 , 3.28002714, -4.6294416 ]])
wcss = [] #Elbow Method

for i in range(1,21):
km = KMeans(n_clusters=i)
km.fit_predict(X)
wcss.append(km.inertia_)
plt.plot(range(1,21),wcss)

Department of Artificial Intelligence and Data Science Page

K. Ramakrishnan College of Engineering (Autonomous), Trichy

Km=KMeans(n_clusters=4)

y_pred = km.fit_predict(X)

df = pd.DataFrame()

df['col1'] = X[:,0]
df['col2'] = X[:,1]
df['col3'] = X[:,2]
df['label'] = y_pred
df

Department of Artificial Intelligence and Data Science Page

K. Ramakrishnan College of Engineering (Autonomous), Trichy

y_pred

OUTPUT

array([0, 2, 1, 0, 2, 0, 1, 2, 1, 0, 1, 0, 1, 2, 0, 2, 3, 3, 3, 3, 1, 3,
2, 1, 1, 0, 2, 3, 0, 2, 3, 0, 0, 1, 2, 0, 1, 2, 3, 3, 1, 2, 1, 0,
2, 3, 3, 0, 2, 1, 0, 3, 3, 2, 1, 1, 2, 3, 3, 0, 1, 2, 2, 2, 3, 1,
3, 0, 1, 2, 3, 0, 0, 0, 1, 0, 0, 0, 1, 3, 2, 1, 2, 2, 1, 2, 0, 1,
1, 2, 0, 2, 2, 3, 2, 1, 1, 0, 1, 1, 0, 1, 1, 0, 2, 2, 2, 3, 3, 2,
3, 0, 3, 2, 3, 2, 1, 2, 3, 0, 1, 0, 3, 3, 2, 1, 3, 0, 3, 1, 3, 1,
2, 1, 0, 3, 3, 0, 2, 2, 1, 0, 0, 2, 1, 0, 0, 1, 2, 0, 2, 0, 3, 3,
0, 3, 1, 2, 3, 0, 0, 1, 1, 3, 3, 1, 1, 3, 3, 1, 0, 1, 0, 3, 0, 0,
2, 3, 3, 1, 2, 2, 3, 3, 3, 1, 2, 1, 0, 0, 2, 2, 0, 3, 3, 0, 2, 1,
3, 2], dtype=int32)

Department of Artificial Intelligence and Data Science Page

K. Ramakrishnan College of Engineering (Autonomous), Trichy

EX.NO 7 CREATION OF MATRIX USING NUMPY

PROGRAM

import numpy as np
# Step 1: Create a 3x4 matrix with values from 10 to 21
matrix = np.arange(10, 22).reshape(3, 4)

print("Original Matrix:\n", matrix)

# Step 2: Replace value 21 with a square number (e.g., 25)

matrix[matrix == 21] = 25
print("\nMatrix after replacing 21 with 25:\n", matrix)

OUTPUT

Original Matrix:
[[10 11 12 13]
[14 15 16 17]
[18 19 20 21]]

Matrix after replacing 21 with 25:

[[10 11 12 13]
[14 15 16 17]
[18 19 20 25]]

Department of Artificial Intelligence and Data Science Page

K. Ramakrishnan College of Engineering (Autonomous), Trichy

EX.NO:8 IMPLEMENTATION OF BASIC EDA OPERATIONS WITH DATASET

PROGRAM

import pandas as pd
df=pd.read_csv(“Symptom-severity.csv”)
df.head()
df.isnull().sum()
df.dtypes
df.describe()
df['weight'].value_counts()
df['Symptom'].value_counts()

OUTPUT

Department of Artificial Intelligence and Data Science Page

K. Ramakrishnan College of Engineering (Autonomous), Trichy

EX.NO:9 CREATION OF DATAFRAME FOR SORTING AND RANKING SALES DATA

PROGRAM

import pandas as pd
# Step 1: Create sample product sales data
data = {
'Product': ['Laptop', 'Smartphone', 'Tablet', 'Monitor', 'Keyboard', 'Mouse', 'Printer'],
'Units_Sold': [150, 300, 120, 180, 400, 500, 90],
'Unit_Price': [700, 500, 300, 200, 50, 25, 150]
}

# Create DataFrame
df = pd.DataFrame(data)
# Step 2: Calculate Sales Amount
df['Sales_Amount'] = df['Units_Sold'] * df['Unit_Price']

# Step 3: Sort by Sales Amount in descending order

df_sorted = df.sort_values(by='Sales_Amount', ascending=False)

# Step 4: Rank products based on Sales Amount

df_sorted['Sales_Rank'] = df_sorted['Sales_Amount'].rank(ascending=False,
method='dense').astype(int)
# Display the final DataFrame
print("Product Sales Data Ranked:\n", df_sorted)

Department of Artificial Intelligence and Data Science Page

K. Ramakrishnan College of Engineering (Autonomous), Trichy

OUTPUT

Product Sales Data Ranked:

Product Units_Sold Unit_Price Sales_Amount Sales_Rank

1 Smartphone 300 500 150000 1
0 Laptop 150 700 105000 2
5 Mouse 500 25 12500 3
4 Keyboard 400 50 20000 4
3 Monitor 180 200 36000 5
2 Tablet 120 300 36000 5
6 Printer 90 150 13500 6

Department of Artificial Intelligence and Data Science Page

K. Ramakrishnan College of Engineering (Autonomous), Trichy

EX.NO:10 FINDING OF HIGHEST RAINFALL AREA IN TAMIL NADU

PROGRAM

import pandas as pd

import matplotlib.pyplot as plt

# --- 1. Load the data ---

data = {

'District': ['Chennai', 'Coimbatore', 'Madurai', 'Tiruchirappalli', 'Kanyakumari', 'Vellore'],

'November_Rainfall_mm': [450, 120, 180, 150, 500, 100]

df = pd.DataFrame(data)

df['District'] = df['District'].str.title() # Normalize casing

# --- 2. Find the highest rainfall area in November (handle ties) ---

max_rainfall = df['November_Rainfall_mm'].max()

highest_rainfall_areas = df[df['November_Rainfall_mm'] == max_rainfall]

print("District(s) with highest rainfall in November:")

for _, row in highest_rainfall_areas.iterrows():

print(f"- {row['District']} ({row['November_Rainfall_mm']} mm)")

print()

Department of Artificial Intelligence and Data Science Page

K. Ramakrishnan College of Engineering (Autonomous), Trichy

# --- 3. Create a bar chart ---

plt.figure(figsize=(10, 6))

bars = plt.bar(df['District'], df['November_Rainfall_mm'], color='skyblue')

plt.xlabel('District')

plt.ylabel('Rainfall (mm)')

plt.title('November Rainfall in Tamil Nadu Districts')

plt.xticks(rotation=45, ha='right')

# Add value labels to each bar

for bar in bars:

height = bar.get_height()

plt.text(bar.get_x() + bar.get_width()/2, height + 10, f"{height} mm",ha='center',

va='bottom', fontsize=9)

plt.tight_layout()

plt.show()

# --- 4. Calculate rainfall percentage for a specific district ---

specific_district = "chennai" # Can be lowercase or mixed case

specific_district = specific_district.title()

if specific_district in df['District'].values:

total_rainfall = df['November_Rainfall_mm'].sum()

Department of Artificial Intelligence and Data Science Page

K. Ramakrishnan College of Engineering (Autonomous), Trichy

district_rainfall=df[df['District'] == specific_district]

['November_Rainfall_mm'].iloc[0]

rainfall_percentage = (district_rainfall / total_rainfall) * 100

print(f"Rainfall percentage for {specific_district} in

November: {rainfall_percentage:.2f}%")

else:

print(f"District '{specific_district}' not found in the dataset.")

OUTPUT

Department of Artificial Intelligence and Data Science Page

K. Ramakrishnan College of Engineering (Autonomous), Trichy

EX.NO:11 DIFFERENTIATION OF FAST AND LEAST MOVING ITEMS IN THE SHOP

PROGRAM

import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns

# Load the Supermarket dataset

df = pd.read_csv('/content/SuperMarket Analysis.csv')

# Show available columns

print("Columns in dataset:\n", df.columns)

# Group by item/product name and sum quantities sold

item_sales = df.groupby('Product line')['Quantity'].sum().sort_values(ascending=False)

# Identify fast-moving and least-moving items

fast_moving = item_sales.head(10)

least_moving = item_sales.tail(10)

# Plot fast and least moving items side-by-side

plt.figure(figsize=(14, 6))

# Fast-moving items

plt.subplot(1, 2, 1)

sns.barplot(x=fast_moving.values, y=fast_moving.index, palette='Greens_r')

plt.title('Top 10 Fast-Moving Items')

plt.xlabel('Total Quantity Sold')

plt.ylabel('Item')

Department of Artificial Intelligence and Data Science Page

K. Ramakrishnan College of Engineering (Autonomous), Trichy

# Least-moving items

plt.subplot(1, 2, 2)

sns.barplot(x=least_moving.values, y=least_moving.index, palette='Reds_r')

plt.title('Bottom 10 Least-Moving Items')

plt.xlabel('Total Quantity Sold')

plt.ylabel('Item')

plt.tight_layout()

plt.show()

# ----------------------------

# Stock availability subplot

# ----------------------------

# Assuming 'Stock' column is available for current stock per item

if 'Stock' in df.columns:

item_stock = df.groupby('Item')['Stock'].sum().sort_values(ascending=False)

plt.figure(figsize=(14, 6))

sns.barplot(x=item_stock.index, y=item_stock.values, palette='Blues_r')

plt.xticks(rotation=90)

plt.title('Stock Available per Item in Store')

plt.xlabel('Item')

plt.ylabel('Available Stock')

plt.tight_layout()

plt.show()

Department of Artificial Intelligence and Data Science Page

K. Ramakrishnan College of Engineering (Autonomous), Trichy

else:

print("\n Column 'Stock' not found in the dataset. Please ensure a 'Stock' column

exists for stock-level visualization.")

OUTPUT

Department of Artificial Intelligence and Data Science Page

K. Ramakrishnan College of Engineering (Autonomous), Trichy

Department of Artificial Intelligence and Data Science Page

K. Ramakrishnan College of Engineering (Autonomous), Trichy

EX.NO:12 CREATION OF PIE CHART TO DISPLAY FOOD ITEMS

PROGRAM

import matplotlib.pyplot as plt

# Sample data

food_items = ['Apples', 'Bread', 'Milk', 'Rice', 'Eggs']

prices = [120, 40, 60, 80, 50]

# Pie chart

plt.figure(figsize=(6,6))

plt.pie(prices, labels=food_items, autopct='%1.1f%%', startangle=140)

plt.title('Price Distribution of Food Items in Market')

plt.axis('equal') # Ensures the pie is a circle

plt.show()

Department of Artificial Intelligence and Data Science Page

K. Ramakrishnan College of Engineering (Autonomous), Trichy

OUTPUT

Department of Artificial Intelligence and Data Science Page

K. Ramakrishnan College of Engineering (Autonomous), Trichy

EX.NO:13 CREATION OF LINEAR REGRESSION FOR STUDENT’S INTERNAL MARKS

PROGRAM

# Import libraries

import pandas as pd

import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LinearRegression

from sklearn.metrics import mean_squared_error, r2_score

# Load dataset

df = pd.read_csv("StudentAcademicData.csv - Sheet1.csv")

# Extract relevant columns

df = df[['Internal marks (out of 20)', 'External marks (out of 80)']].dropna()

df.columns = ['Internal_Marks', 'External_Marks']

# Split data into features and target

X = df[['Internal_Marks']]

y = df['External_Marks']

# Train-test split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,

random_state=42)

# Create and train model

model = LinearRegression()

model.fit(X_train, y_train)

Department of Artificial Intelligence and Data Science Page

K. Ramakrishnan College of Engineering (Autonomous), Trichy

# Make predictions

y_pred = model.predict(X_test)

# Evaluation

print("Mean Squared Error:", mean_squared_error(y_test, y_pred))

print("R-squared Score:", r2_score(y_test, y_pred))

# Predict new result

new_mark = [[15]] # example internal mark

predicted_result = model.predict(new_mark)

print("Predicted Semester Result for Internal Mark 15:", predicted_result[0])

# Plotting

plt.scatter(X, y, color='blue', label='Actual Data')

plt.plot(X, model.predict(X), color='red', label='Regression Line')

plt.xlabel("Internal Marks (out of 20)")

plt.ylabel("Semester Result (External Marks out of 80)")

plt.title("Linear Regression: Internal vs External Marks")

plt.legend()

plt.grid(True)

plt.show()

Department of Artificial Intelligence and Data Science Page

K. Ramakrishnan College of Engineering (Autonomous), Trichy

Output

Mean Squared Error: 284.50287708929307

R-squared Score: -0.007268107945806568

Predicted Semester Result for Internal Mark 15: 53.68967909800521

Department of Artificial Intelligence and Data Science Page

K. Ramakrishnan College of Engineering (Autonomous), Trichy

EX.NO:14 CREATION OF CLUSTER USING MACHINE LEARNING ALGORITHM

PROGRAM

import pandas as pd

from sklearn.feature_extraction.text import TfidfVectorizer

from sklearn.cluster import KMeans

from sklearn.metrics import silhouette_score

import matplotlib.pyplot as plt

# Step 1: Load the dataset

df = pd.read_csv("salaries.csv") # Update with your filename

# Step 2: Clean and combine text fields

# Check for the existence of 'job_title' column before using it

if 'job_title' in df.columns:

df.dropna(subset=["job_title"], how='all', inplace=True)

df["text"] = df["job_title"].fillna('')

else:

print("Error: 'job_title' column not found in the DataFrame.")

# Handle the error appropriately, e.g., exit or use a different column

# Step 3: TF-IDF Vectorization

vectorizer = TfidfVectorizer(stop_words='english', max_features=1000)

X = vectorizer.fit_transform(df["text"])

# Step 4: Apply KMeans Clustering

n_clusters = 4

Department of Artificial Intelligence and Data Science Page

K. Ramakrishnan College of Engineering (Autonomous), Trichy

kmeans = KMeans(n_clusters=n_clusters, random_state=42, n_init='auto')

df["cluster"] = kmeans.fit_predict(X)

# Step 5: Print Top Terms per Cluster

def print_top_keywords_per_cluster(model, vectorizer, n_terms=10):

terms = vectorizer.get_feature_names_out()

for i, center in enumerate(model.cluster_centers_):

top_terms = center.argsort()[-n_terms:][::-1]

print(f"\n Cluster {i} Top Keywords:")

print(", ".join(terms[top_terms]))

print_top_keywords_per_cluster(kmeans, vectorizer)

# Step 6: Display Sample Job Titles per Cluster

for i in range(n_clusters):

print(f"\n Cluster {i} - Sample Job Offers:")

# Check for the existence of 'company' and 'job_title' columns before using them

if 'company' in df.columns and 'job_title' in df.columns:

print(df[df["cluster"] == i][["company", "job_title"]].head(3))

elif 'job_title' in df.columns:

print(df[df["cluster"] == i][["job_title"]].head(3))

else:

print("Error: 'company' or 'job_title' column not found in the DataFrame.")

# Step 7: Optional Visualization - Cluster Size

plt.figure(figsize=(6, 4))

df["cluster"].value_counts().sort_index().plot(kind='bar', color='skyblue')

Department of Artificial Intelligence and Data Science Page

K. Ramakrishnan College of Engineering (Autonomous), Trichy

plt.title(" Number of Job Offers per Cluster")

plt.xlabel("Cluster")

plt.ylabel("Count")

plt.xticks(rotation=0)

plt.grid(True)

plt.tight_layout()

plt.show()

# Step 8: Optional - Evaluate Clustering Quality

sil_score = silhouette_score(X, df["cluster"])

print(f"\n Silhouette Score: {sil_score:.3f}")

# Step 9: Optional - Save the results

df.to_csv("clustered_ml_jobs.csv", index=False)

OUTPUT

Cluster 0 Top Keywords:

executive, account, enterprise, visualization, writer, web, trainee, trader, technology,

technologist

Cluster 1 Top Keywords:

engineer, software, data, learning, machine, analytics, ai, research, systems, intelligence

Cluster 2 Top Keywords:

manager, product, engineering, data, analytics, governance, operations, intelligence,

business, ai

Department of Artificial Intelligence and Data Science Page

K. Ramakrishnan College of Engineering (Autonomous), Trichy

Cluster 3 Top Keywords:

data, scientist, analyst, architect, research, associate, developer, applied, specialist,

consultant

Cluster 0 - Sample Job Offers:

job_title

1437 Executive

1438 Executive

1982 Account Executive

Cluster 1 - Sample Job Offers:

job_title

8 Software Engineer

9 Software Engineer

10 Machine Learning Engineer

Cluster 2 - Sample Job Offers:

job_title

24 Manager

25 Manager

72 Manager

Cluster 3 - Sample Job Offers:

job_title

0 Analyst

1 Analyst

2 Data Quality Lead

Department of Artificial Intelligence and Data Science Page

K. Ramakrishnan College of Engineering (Autonomous), Trichy

Department of Artificial Intelligence and Data Science Page

K. Ramakrishnan College of Engineering (Autonomous), Trichy

EX.NO:15 SALARY PREDICTION USING MACHINE LEARNING ALGORITHM

PROGRAM

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import OneHotEncoder, StandardScaler

from sklearn.compose import ColumnTransformer

from sklearn.pipeline import Pipeline

from sklearn.ensemble import RandomForestRegressor

from sklearn.metrics import mean_squared_error

# Simulated dataset data = pd.DataFrame({

'experience': [1, 3, 5, 7, 10, 12],

'education_level': ['Bachelors', 'Masters', 'Masters', 'PhD', 'PhD', 'Masters'],

'location': ['Mumbai', 'Delhi', 'Bangalore', 'Chennai', 'Hyderabad', 'Pune'],

'skills_score': [65, 70, 80, 85, 90, 95],

'salary': [600000, 800000, 1200000, 1600000, 2000000, 2200000]

})

# Features and target

X = data.drop('salary', axis=1) y = data['salary']

Department of Artificial Intelligence and Data Science Page

K. Ramakrishnan College of Engineering (Autonomous), Trichy

# Preprocessing

preprocessor = ColumnTransformer( transformers=[

('num', StandardScaler(), ['experience', 'skills_score']),

('cat', OneHotEncoder(handle_unknown='ignore'), ['education_level', 'location'])

# Pipeline

model = Pipeline(steps=[ ('preprocess', preprocessor),

('regressor', RandomForestRegressor(n_estimators=100, random_state=42))

])

# Train-test split for visualization

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

model.fit(X_train, y_train) y_pred = model.predict(X_test)

# 1. Actual vs Predicted

plt.figure(figsize=(8,5))

plt.scatter(y_test, y_pred, color='blue')

plt.plot([min(y_test), max(y_test)], [min(y_test), max(y_test)], color='red', linestyle='--')

plt.xlabel('Actual Salary')

plt.ylabel('Predicted Salary')

plt.title('Actual vs Predicted Salary')

plt.grid(True)

plt.tight_layout()

plt.show()

Department of Artificial Intelligence and Data Science Page

K. Ramakrishnan College of Engineering (Autonomous), Trichy

# 2. Residuals (Error)

errors = y_test - y_pred plt.figure(figsize=(8,5))

plt.bar(range(len(errors)), errors, color='orange')

plt.xlabel('Test Instance')

plt.ylabel('Prediction Error (₹)')

plt.title('Prediction Errors per Test Case')

plt.grid(True)

plt.tight_layout()

plt.show()

# 3. Feature Importance

rf_model = model.named_steps['regressor']

feature_names = model.named_steps['preprocess'].transform(X_train).shape[1]

importance = rf_model.feature_importances_

plt.figure(figsize=(10,5))

plt.bar(range(len(importance)), importance)

plt.title('Feature Importance Scores')

plt.xlabel('Feature Index')

plt.ylabel('Importance')

plt.grid(True)

plt.tight_layout()

plt.show()

Department of Artificial Intelligence and Data Science Page

K. Ramakrishnan College of Engineering (Autonomous), Trichy

OUTPUT

Department of Artificial Intelligence and Data Science Page

K. Ramakrishnan College of Engineering (Autonomous), Trichy

Department of Artificial Intelligence and Data Science Page

K. Ramakrishnan College of Engineering (Autonomous), Trichy

Ex.No:16 CONTENT BEYOND SYLLABUS

SENTIMENTAL ANALYSIS USING LEXICON CLASSIFICATION ALGORITHM

PROGRAM

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns

import re

%matplotlib inline import warnings

warnings.filterwarnings(“ignore”)

import os

print(os.listdir(“../input”))

pd.set_option(‘display.max_columns’,None)

US_comments=pd.read_csv(‘../input/youtube/UScomments.csv’,error_bad_lines=False)

US_videos=pd.read_csv(‘../input/youtube/USvideos.csv’,error_bad_lines=False)

US_videos.head()

US_videos.shape US_videos.nunique()

US_videos.info()

US_videos.head()

US_comments.head()

US_comments.shape US_comments.isnull().sum()

US_comments.dropna(inplace=True)

Department of Artificial Intelligence and Data Science Page

K. Ramakrishnan College of Engineering (Autonomous), Trichy

US_comments isnull().sum()

US_comments.shape

US_comments.nunique()

US_comments.info()

US_comments.drop(41587,inplace=True)

US_comments=US_comments.reset_index().drop(‘index’,axis=1)

US_comments.likes=US_comments.likes.astype(int)

US_comments.replies=US_comments.replies.astype(int)

US_comments.head()

US_comments[‘comment_text’]=US_comments[‘comment_text’].str.replace(“[^a-zA-
Z#]”,” “)

US_comments[‘comment_text’]=US_comments[‘comment_text’].apply(lambda x: ‘
‘.join([w for w in x.split() if len(w)>3]))

US_comments[‘comment_text’]=US_comments[‘comment_text’].apply(lambda
x:x.lower())

tokenized_tweet=US_comments[‘comment_text’].apply(lambda x:x.split())
tokenized_tweet.head()

from nltk.stem import WordNetLemmatizer from nltk.corpus import stopwords

wnl = WordNetLemmatizer()

tokenized_tweet.apply(lambda x: [wnl.lemmatize(i) for i in x if i not in

set(stopwords.words('english'))])

tokenized_tweet.head()

US_comments['comment_text'] = tokenized_tweet

import nltk

nltk.download('vader_lexicon')

Department of Artificial Intelligence and Data Science Page

K. Ramakrishnan College of Engineering (Autonomous), Trichy

from nltk.sentiment.vader import SentimentIntensityAnalyzer sia =

SentimentIntensityAnalyzer()

US_comments['Sentiment Scores'] = US_comments['comment_text'].apply(lambda

x:sia.polarity_scores(x)['compound'])

US_comments.head()

US_comments['Sentiment'] = US_comments['Sentiment Scores'].apply(lambda s :

'Positive' if s > 0 else ('Neutral' if s == 0 else 'Negative'))

US_comments.head()

US_comments.Sentiment.value_counts()

videos = []

for i in range(0,US_comments.video_id.nunique()):

a = US_comments[(US_comments.video_id == US_comments.video_id.unique()[i])

& (US_comments.Sentiment == 'Positive')].count()[0]

b=US_comments[US_comments.video_id==US_comments.video_id.unique()[i]]

['Sen timent'].value_counts().sum()

Percentage = (a/b)*100

videos.append(round(Percentage,2))

Positivity = pd.DataFrame(videos,US_comments.video_id.unique()).reset_index()
Positivity.columns = ['video_id','Positive Percentage']

Positivity.head() channels = []

for i in range(0,Positivity.video_id.nunique()):

channels.append(US_videos[US_videos.video_id ==
Positivity.video_id.unique()[i]]['channel_title'].unique()[0])

Positivity['Channel'] = channels Positivity.head()

Positivity[Positivity['Positive Percentage'] == Positivity['Positive Percentage'].max()]

Department of Artificial Intelligence and Data Science Page

K. Ramakrishnan College of Engineering (Autonomous), Trichy

Positivity[Positivity['Positive Percentage'] == Positivity['Positive Percentage'].min()]

all_words = ' '.join([text for text in US_comments['comment_text']])

from wordcloud import WordCloud

wordcloud = WordCloud(width=800, height=500, random_state=21,

max_font_size=110).generate(all_words)

plt.figure(figsize=(10, 7))

plt.imshow(wordcloud, interpolation="bilinear")

plt.axis('off')

plt.show()

all_words_posi = ' '.join([text for text in

US_comments['comment_text'][US_comments.Sentiment == 'Positive']])

wordcloud_posi = WordCloud(width=800, height=500, random_state=21,

max_font_size=110).generate(all_words_posi)

plt.figure(figsize=(10, 7)) plt.imshow(wordcloud_posi, interpolation="bilinear")

plt.axis('off')

plt.show()

all_words_nega = ' '.join([text for text in

US_comments['comment_text'][US_comments.Sentiment == 'Negative']])

wordcloud_nega = WordCloud(width=800, height=500, random_state=21,

max_font_size=110).generate(all_words_nega)

plt.figure(figsize=(10, 7)) plt.imshow(wordcloud_nega, interpolation="bilinear")

plt.axis('off')

plt.show()

Department of Artificial Intelligence and Data Science Page

K. Ramakrishnan College of Engineering (Autonomous), Trichy

all_words_neu = ' '.join([text for text in

US_comments['comment_text'][US_comments.Sentiment == 'Neutral']])

wordcloud_neu = WordCloud(width=800, height=500, random_state=21,

max_font_size=110).generate(all_words_neu)

plt.figure(figsize=(10, 7))

plt.imshow(wordcloud_neu, interpolation="bilinear")

plt.axis('off')

plt.show()

OUTPUT

['youtube']

(7992, 11)

video_id 2364

title 2398

channel_title 1230

category_id 16

tags 2204

likes 6624

dislikes 2531

comment_total 4152

thumbnail_link 2364

date 40

Department of Artificial Intelligence and Data Science Page

K. Ramakrishnan College of Engineering (Autonomous), Trichy

dtype: int64

RangeIndex: 7992 entries, 0 to 7991

Data columns (total 11 columns):

video_id 7992 non-null object

title 7992 non-null object

channel_title 7992 non-null object

category_id 7992 non-null int64

tags 7992 non-null object

views 7992 non-null int64

likes 7992 non-null int64

dislikes 7992 non-null int64

comment_total 7992 non-null int64

thumbnail_link 7992 non-null object

date 7992 non-null float64

dtypes: float64(1), int64(5), object(5) memory usage: 686.9+ KB

(691400, 4)

video_id 0

comment_text 25

likes 0

replies 0 dtype: int64 video_id 0

comment_text 0

likes 0

replies 0 dtype: int64 (691375, 4)

Department of Artificial Intelligence and Data Science Page

K. Ramakrishnan College of Engineering (Autonomous), Trichy

video_id 2266

comment_text 434076

likes 1284

replies 479

dtype: int64

Int64Index: 691375 entries, 0 to 691399

Data columns (total 4 columns):

video_id 691375 non-null object

comment_text 691375 non-null object

likes 691375 non-null object

replies 691375 non-null object

dtypes: object(4)

memory usage: 26.4+ MB

0 [logan, paul]

1 [been, following, from, start, your, vine, cha...

2 [kong, maverick]

3 [attendance]

4 [trending]

Name: comment_text, dtype: object

0 [logan, paul]

1 [been, following, from, start, your, vine, cha...

2 [kong, maverick]

3 [attendance]

Department of Artificial Intelligence and Data Science Page

K. Ramakrishnan College of Engineering (Autonomous), Trichy

4 [trending]

Name: comment_text,dtype:object

Positive 305358
Neutral 260986
Negative 125030
Name: Sentiment, dtype: int64

Department of Artificial Intelligence and Data Science Page

K. Ramakrishnan College of Engineering (Autonomous), Trichy

POSITIVE COMMENTS

NEGATIVE COMMENTS

Department of Artificial Intelligence and Data Science Page

K. Ramakrishnan College of Engineering (Autonomous), Trichy

NEUTRAL COMMENTS

Department of Artificial Intelligence and Data Science Page

Data Preprocessing & Visualization1
No ratings yet
Data Preprocessing & Visualization1
2 pages
Da Rec
No ratings yet
Da Rec
29 pages
FDS Record-1-4
No ratings yet
FDS Record-1-4
18 pages
Salary Prediction Model Overview
No ratings yet
Salary Prediction Model Overview
35 pages
DS Lab Programs
No ratings yet
DS Lab Programs
35 pages
ML Complete Notes Hridoy
No ratings yet
ML Complete Notes Hridoy
5 pages
ML 1-11
No ratings yet
ML 1-11
27 pages
Data Preprocessing Techniques in Python
No ratings yet
Data Preprocessing Techniques in Python
27 pages
MLP Regressor with Sklearn on Wine Data
No ratings yet
MLP Regressor with Sklearn on Wine Data
10 pages
DA Lab
No ratings yet
DA Lab
27 pages
Final-12-Lab Programs
No ratings yet
Final-12-Lab Programs
30 pages
ML LAB Manual-1
No ratings yet
ML LAB Manual-1
33 pages
AI Practical 2025
No ratings yet
AI Practical 2025
14 pages
ML File Syllabus
No ratings yet
ML File Syllabus
43 pages
AIDS - DM Using Python - Lab Programs
No ratings yet
AIDS - DM Using Python - Lab Programs
19 pages
ML Lab Maual
No ratings yet
ML Lab Maual
25 pages
Data Visualization & Preprocessing Guide
No ratings yet
Data Visualization & Preprocessing Guide
18 pages
.2 Dse
No ratings yet
.2 Dse
14 pages
Class 12 Pandas Practical Guide
No ratings yet
Class 12 Pandas Practical Guide
15 pages
Pandas
No ratings yet
Pandas
35 pages
Python Data Science Cheat Sheet
0% (1)
Python Data Science Cheat Sheet
3 pages
Data Analytics Lab Manuals 2025-2026-1
No ratings yet
Data Analytics Lab Manuals 2025-2026-1
39 pages
Advanced Machine Learning Course Guide
No ratings yet
Advanced Machine Learning Course Guide
36 pages
Program 4: Public
No ratings yet
Program 4: Public
10 pages
Some Exercises
No ratings yet
Some Exercises
9 pages
External
No ratings yet
External
11 pages
Ids 1
No ratings yet
Ids 1
30 pages
Class Xii PDF For Practical
No ratings yet
Class Xii PDF For Practical
24 pages
Data Mining with Python Lab Guide
No ratings yet
Data Mining with Python Lab Guide
39 pages
12 IP Practial Programs 2025-26
No ratings yet
12 IP Practial Programs 2025-26
10 pages
Geo Python Doc (1) 7,8 Bavesh
No ratings yet
Geo Python Doc (1) 7,8 Bavesh
9 pages
Statistics IMP Questions and Answers
No ratings yet
Statistics IMP Questions and Answers
23 pages
DataAnalytics Lab Manual
No ratings yet
DataAnalytics Lab Manual
35 pages
Aiml Lab Manaual R23
100% (1)
Aiml Lab Manaual R23
10 pages
DA Programs
No ratings yet
DA Programs
44 pages
Etl and Stats Code
No ratings yet
Etl and Stats Code
2 pages
Practical (Data Science)
No ratings yet
Practical (Data Science)
13 pages
Sanyam Data Science
No ratings yet
Sanyam Data Science
33 pages
Step-by-Step Explanation of Python Data Preprocessing Script
No ratings yet
Step-by-Step Explanation of Python Data Preprocessing Script
9 pages
ML
No ratings yet
ML
21 pages
Python 1
No ratings yet
Python 1
3 pages
Data Science
No ratings yet
Data Science
18 pages
Python Pandas Practical Examples
No ratings yet
Python Pandas Practical Examples
15 pages
Project Paarth
No ratings yet
Project Paarth
21 pages
IP Practical File (Angel)
No ratings yet
IP Practical File (Angel)
29 pages
Practical File Question 28.09.2022
No ratings yet
Practical File Question 28.09.2022
15 pages
Eda Code Snippets
No ratings yet
Eda Code Snippets
17 pages
ML Lab Manual Final
No ratings yet
ML Lab Manual Final
36 pages
Practical File Infomatics Practices 2024-25
No ratings yet
Practical File Infomatics Practices 2024-25
39 pages
Data Prep for ML Beginners
No ratings yet
Data Prep for ML Beginners
39 pages
DS Manual 1
No ratings yet
DS Manual 1
96 pages
Python Data Handling with Pandas
No ratings yet
Python Data Handling with Pandas
12 pages
Da Lab Mannual
No ratings yet
Da Lab Mannual
25 pages
Task 1
No ratings yet
Task 1
5 pages
IP - Record 2023-24
No ratings yet
IP - Record 2023-24
79 pages
Aanik Info Practical 3261
No ratings yet
Aanik Info Practical 3261
61 pages
AMLW Assignment 3
No ratings yet
AMLW Assignment 3
2 pages
Ai Manual Lab
No ratings yet
Ai Manual Lab
54 pages
Certificate
No ratings yet
Certificate
25 pages
Final Lab Manual MLT
No ratings yet
Final Lab Manual MLT
74 pages
Cloud Computing Project Titles With Short Descriptions
No ratings yet
Cloud Computing Project Titles With Short Descriptions
6 pages
Beige Modern Fashion Article Page A4 Document
No ratings yet
Beige Modern Fashion Article Page A4 Document
1 page
Export
No ratings yet
Export
3 pages
Azure New Questions 2
No ratings yet
Azure New Questions 2
29 pages
Apti Ssa1
No ratings yet
Apti Ssa1
1 page
Narayanan21b-DR For Facility Location and Single Linkage Clustering
No ratings yet
Narayanan21b-DR For Facility Location and Single Linkage Clustering
10 pages
Data Filtering for Analysts
No ratings yet
Data Filtering for Analysts
43 pages
2.10 Partitioning Methods - K-Means and K-Medoids
No ratings yet
2.10 Partitioning Methods - K-Means and K-Medoids
38 pages
Chapter 7 - Clustering
No ratings yet
Chapter 7 - Clustering
74 pages
Mineral Mapping in Eritrea
No ratings yet
Mineral Mapping in Eritrea
31 pages
SAP HANA PAL Guide for Developers
No ratings yet
SAP HANA PAL Guide for Developers
672 pages
Advanced Computational and Communication Paradigms: Samarjeet Borah Tapan K. Gandhi Vincenzo Piuri
No ratings yet
Advanced Computational and Communication Paradigms: Samarjeet Borah Tapan K. Gandhi Vincenzo Piuri
536 pages
Where Can Buy Python Programming in Context, Fourth Edition Julie Anderson and Jon Anderson Ebook With Cheap Price
100% (8)
Where Can Buy Python Programming in Context, Fourth Edition Julie Anderson and Jon Anderson Ebook With Cheap Price
52 pages
Intro Big Data
No ratings yet
Intro Big Data
36 pages
Course Project Guideline - New
No ratings yet
Course Project Guideline - New
6 pages
Computer Vision ch1
No ratings yet
Computer Vision ch1
80 pages
Toward Precision Agriculture Integrating Machine Learning Techniques For Smart Farming Systems
No ratings yet
Toward Precision Agriculture Integrating Machine Learning Techniques For Smart Farming Systems
13 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
3 pages
ps3 Sol
No ratings yet
ps3 Sol
21 pages
EDA Python Code Cheatsheets
No ratings yet
EDA Python Code Cheatsheets
52 pages
Datamites Certified Data Scientist Brochure
No ratings yet
Datamites Certified Data Scientist Brochure
19 pages
Machine Learning For Marketing in Python
No ratings yet
Machine Learning For Marketing in Python
3 pages
Kidney Stone Paper 2
No ratings yet
Kidney Stone Paper 2
5 pages
HPCL Section 13 Notes CH
No ratings yet
HPCL Section 13 Notes CH
17 pages
Weather Patterns Analysis and Prediction
No ratings yet
Weather Patterns Analysis and Prediction
8 pages
Deep Learning Assignment
No ratings yet
Deep Learning Assignment
8 pages
DM and ML Unit-4 Notes
No ratings yet
DM and ML Unit-4 Notes
92 pages
Understanding Cluster Analysis Techniques
100% (1)
Understanding Cluster Analysis Techniques
16 pages
Intrusion Detection Using Big Data and Deep Learning Techniques
No ratings yet
Intrusion Detection Using Big Data and Deep Learning Techniques
9 pages
DS Journal - Final
No ratings yet
DS Journal - Final
37 pages
Supervised Learning Questions
No ratings yet
Supervised Learning Questions
2 pages
DS Practical Question Set
No ratings yet
DS Practical Question Set
12 pages
Customer Value Analysis Using Weighted RFM Model: Empirical Case Study
No ratings yet
Customer Value Analysis Using Weighted RFM Model: Empirical Case Study
17 pages
AI Concepts and Machine Learning Basics
No ratings yet
AI Concepts and Machine Learning Basics
86 pages
Lecture Slides For Introduction To Applied Linear Algebra: Vectors, Matrices, and Least Squares
No ratings yet
Lecture Slides For Introduction To Applied Linear Algebra: Vectors, Matrices, and Least Squares
470 pages