0% found this document useful (0 votes)
4 views1 page

DVT Exp 3

The document outlines a Python script for analyzing financial data using clustering techniques. It includes loading data, handling missing values, standardizing the data, applying K-Means clustering, and visualizing results through scatter plots, histograms, and heatmaps. The script focuses on identifying patterns and distributions within financial metrics.

Uploaded by

abhilashdopati
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views1 page

DVT Exp 3

The document outlines a Python script for analyzing financial data using clustering techniques. It includes loading data, handling missing values, standardizing the data, applying K-Means clustering, and visualizing results through scatter plots, histograms, and heatmaps. The script focuses on identifying patterns and distributions within financial metrics.

Uploaded by

abhilashdopati
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd

import pandas as pd

import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler

# Load financial data


df = pd.read_csv("financial_analysis.csv")

# Display first few rows


print(df.head())

# Exclude non-numeric columns (e.g., 'Company')


numeric_cols = df.select_dtypes(include=['number']).columns

# Handle missing values only for numeric columns


df[numeric_cols] = df[numeric_cols].fillna(df[numeric_cols].mean())

# Standardize data for clustering


scaler = StandardScaler()
scaled_data = scaler.fit_transform(df[numeric_cols])

# K-Means Clustering
kmeans = KMeans(n_clusters=3, random_state=42, n_init=10)
df['Cluster'] = kmeans.fit_predict(scaled_data)

# Scatter plot for Clusters (using first two numeric columns)


plt.figure(figsize=(8,6))
sns.scatterplot(x=df[numeric_cols[0]], y=df[numeric_cols[1]], hue=df['Cluster'],
palette='viridis')
plt.xlabel(numeric_cols[0])
plt.ylabel(numeric_cols[1])
plt.title("Clustering Analysis")
plt.show()

# Histogram of a financial metric (e.g., Revenue)


plt.figure(figsize=(8,6))
sns.histplot(df['Revenue'], bins=30, kde=True, color='blue')
plt.title("Revenue Distribution")
plt.show()

# Heatmap of feature correlations


plt.figure(figsize=(10, 6))
sns.heatmap(df[numeric_cols].corr(), annot=True, cmap="coolwarm", linewidths=0.5)
plt.title("Correlation Heatmap")
plt.show()

You might also like