0% found this document useful (0 votes)
25 views15 pages

DVT Lab

The document is a lab manual detailing various data visualization techniques through a series of experiments. Each experiment focuses on different aspects of data analysis, including data acquisition, statistical analysis, financial analysis, time-series analysis, and visualization of massive datasets. The manual provides step-by-step programming examples using Python and libraries like pandas, matplotlib, and seaborn to illustrate the concepts.

Uploaded by

raios1747
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views15 pages

DVT Lab

The document is a lab manual detailing various data visualization techniques through a series of experiments. Each experiment focuses on different aspects of data analysis, including data acquisition, statistical analysis, financial analysis, time-series analysis, and visualization of massive datasets. The manual provides step-by-step programming examples using Python and libraries like pandas, matplotlib, and seaborn to illustrate the concepts.

Uploaded by

raios1747
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 15

Data Visualization Techniques Lab Manual

List of Experiments

1. Acquiring and plotting data.

2. Statistical Analysis – such as Multivariate Analysis, PCA, LDA, Correlation regression and analysis of
variance.

3. Financial analysis using Clustering, Histogram, and HeatMap.

4. Time-series analysis – Stock market.

5. Visualization of various massive datasets – Finance, Healthcare, Census, Geospatial.

6. Visualization on Streaming datasets (Stock market dataset, weather forecasting).

7. Market-Basket Data analysis and visualization.

8. Text visualization using web analytics.

EXPERIMENT NO 1: Acquiring and plotting data


AIM: To acquire data from various sources and plot it using Python/R.

Explanation:

 Data Acquisition:

 Data can be acquired from various sources like CSV files, Excel files, databases, or APIs.
 For this example, we will create a small dataset manually using pandas.

 Data Cleaning and Processing:

 Ensure that the data is clean (no missing values, duplicates, etc.).
 Convert data to a suitable format for plotting.

 Plotting:

 Use the matplotlib library to create a line plot.


 Label the x-axis, y-axis, and provide a meaningful title.
 Use markers to highlight the points on the graph.

PROGRAM:

import pandas as pd
import matplotlib.pyplot as plt

Step 1: Create sample data

data = pd.DataFrame({

'x': [1, 2, 3, 4, 5],

'y': [10, 20, 15, 25, 30]

})

Step 2: Plot the data

plt.plot(data['x'], data['y'], marker='o', linestyle='-', color='blue', label='Line Plot')

Step 3: Add labels and title

plt.title('Sample Data Plot')

plt.xlabel('X-Axis')

plt.ylabel('Y-Axis')

Step 4: Display the plot

plt.legend()

plt.show()

Output:

👉 Line Plot:

 X-Axis: Values → 1, 2, 3, 4, 5
 Y-Axis: Values → 10, 20, 15, 25, 30
 Line Color: Blue
 Markers: Circular points at each data point
 Title: "Sample Data Plot"
Experiment 2: Statistical Analysis

Aim:

To perform Principal Component Analysis (PCA) and visualize the reduced dimensions.

Explanation:

1. Purpose of PCA:
o PCA reduces the dimensionality of data while preserving the variance.
o It helps visualize high-dimensional data in a lower-dimensional space.
2. Steps Involved:
o Load the sample data.
o Apply PCA to reduce the data to 2 principal components.
o Plot the reduced components using a scatter plot.
3. Why PCA is Useful:
o Reduces complexity in data.
o Highlights the most important patterns and relationships.
o Helps in clustering and visualization of high-dimensional data.

Program:

from sklearn.decomposition import PCA

import numpy as np

import matplotlib.pyplot as plt

Sample data (5 points, 2 dimensions)

data = np.array([[2, 3], [3, 4], [4, 5], [5, 6], [6, 7]])

Step 1: Apply PCA

pca = PCA(n_components=2)

transformed = pca.fit_transform(data)

Step 2: Plot PCA result

plt.scatter(transformed[:, 0], transformed[:, 1], color='red', marker='o')

plt.title('PCA Analysis')

plt.xlabel('Principal Component 1')


plt.ylabel('Principal Component 2')

Step 3: Display the plot

plt.grid(True)

plt.show()

Output:

👉 PCA Scatter Plot:

 X-axis → Principal Component 1


 Y-axis → Principal Component 2
 Red points → Reduced 2D data points

Experiment 3: Financial Analysis using Clustering, Histogram, and


Heatmap

Aim:

To analyze financial data using clustering, histogram, and heatmap.

Explanation:

1. Financial Data Overview:


o Financial data includes information like stock prices, trading volumes, and
returns.
o Analyzing this data helps understand market trends, correlations, and patterns.
2. Clustering:
o Clustering groups similar financial patterns together (e.g., similar stock
behaviors).
o KMeans clustering algorithm is commonly used.
3. Histogram:
o A histogram shows the distribution of financial values (e.g., stock prices).
o Helps identify the frequency of different value ranges.
4. Heatmap:
o A heatmap shows correlations between financial variables.
o High correlation values (near +1 or -1) indicate strong relationships between
variables.

Program :
import seaborn as sns
import pandas as pd

import numpy as np

from sklearn.cluster import KMeans

Step 1: Create sample financial data

data = pd.DataFrame({

'Price': np.random.rand(10) * 100,

'Volume': np.random.rand(10) * 1000

})

Step 2: Clustering using KMeans

kmeans = KMeans(n_clusters=2, n_init=10)

data['Cluster'] = kmeans.fit_predict(data[['Price', 'Volume']])

Step 3: Plot histogram of Price

sns.histplot(data['Price'], bins=5, color='skyblue')

plt.title('Price Distribution')

plt.show()

Step 4: Plot heatmap for correlation

sns.heatmap(data.corr(), annot=True, cmap='coolwarm')

plt.title('Heatmap of Financial Data')

plt.show()

Output:

👉 Histogram:

 X-axis → Price range


 Y-axis → Frequency of occurrence
 Color → Sky blue
👉 Heatmap:

 Shows correlation between Price and Volume


 Strong positive/negative correlation will appear in shades of red/blue
Experiment 4: Time-Series Analysis – Stock Market

Aim:

To perform time-series analysis on stock market data and visualize trends over time.

Explanation:

1. Time-Series Data:
o Time-series data is a sequence of data points collected or recorded at specific time
intervals (e.g., daily stock prices).
o Analyzing time-series data helps identify patterns, trends, and seasonal effects.

2. Key Concepts:
o Trend: Long-term increase or decrease in values over time.
o Seasonality: Repeating patterns over fixed time intervals (e.g., quarterly cycles).
o Noise: Random variations in the data.

3. Steps Involved:
o Generate sample stock price data over a period of time.
o Plot the trend using a line graph.
o Highlight trends and patterns.

Program:

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

Step 1: Create sample time-series data (Stock Prices)

np.random.seed(0)

dates = pd.date_range(start='2023-01-01', periods=30) # 30 days of data

prices = np.cumsum(np.random.randn(30) * 2 + 100) # Generate stock price with noise

Step 2: Create DataFrame

data = pd.DataFrame({'Date': dates, 'Stock Price': prices})

Step 3: Plot time-series data


plt.figure(figsize=(10, 5))

plt.plot(data['Date'], data['Stock Price'], marker='o', linestyle='-', color='green', label='Stock Price')

plt.title('Stock Market Trend')

plt.xlabel('Date')

plt.ylabel('Stock Price')

plt.xticks(rotation=45)

Step 4: Add legend and grid

plt.legend()

plt.grid(True)

plt.show()

Output:

👉 Time-Series Line Plot:

 X-Axis: Dates (Jan 1, 2023 → Jan 30, 2023)


 Y-Axis: Stock Prices
 Line Color: Green
 Markers: Circular points showing daily values
 Trend: Gradual increase in stock price over time

Experiment 5: Visualization of Various Massive Datasets – Finance,


Healthcare, Census, and Geospatial

Aim:

To visualize large datasets using bar plots and scatter plots for analysis in finance, healthcare,
census, and geospatial data.

Explanation:

1. Massive Datasets:
o Large datasets may have millions of records.
o Visualization helps to identify trends, patterns, and outliers.

2. Types of Data:
o Finance: Stock prices, market trends, returns.
o Healthcare: Patient records, disease outbreaks, drug trials.
o Census: Population distribution, demographics, and migration.
o Geospatial: Location-based data, traffic patterns, environmental data.

3. Visualization Techniques:
o Bar Plot: For categorical and comparative data.
o Scatter Plot: For relationship analysis between two variables.

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

import seaborn as sns

Step 1: Create sample dataset for Finance and Healthcare

data = pd.DataFrame({

'Category': ['Finance', 'Healthcare', 'Census', 'Geospatial'],

'Value': [120, 150, 80, 100]

})

Step 2: Bar Plot

plt.figure(figsize=(8, 5))

sns.barplot(x='Category', y='Value', data=data, palette='viridis')

plt.title('Data Value Across Different Domains')

plt.xlabel('Category')

plt.ylabel('Value')

plt.show()

Step 3: Scatter Plot for Geospatial Data

x = np.random.rand(50) * 100

y = np.random.rand(50) * 100

plt.figure(figsize=(8, 5))
plt.scatter(x, y, color='blue', edgecolors='black')

plt.title('Sample Geospatial Data')

plt.xlabel('Latitude')

plt.ylabel('Longitude')

plt.grid(True)

plt.show()

Output:

👉 Bar Plot:

 X-axis → Category
 Y-axis → Value
 Color → Viridis color palette

👉 Scatter Plot:

 X-axis → Latitude
 Y-axis → Longitude
 Data points → Random geospatial locations

Experiment 6: Visualization on Streaming Dataset (Stock Market


Dataset, Weather Forecasting)

Aim:

To visualize a real-time streaming dataset for stock market data and weather data.

Explanation:

1. Streaming Data:
o Data generated continuously over time (e.g., stock prices, weather updates).
o Requires real-time processing and visualization.

2. Types of Streaming Data:


o Stock Market Data: Stock prices, trading volumes, market indices.
o Weather Data: Temperature, humidity, wind speed, and precipitation.

3. Visualization Techniques:
o Line Plot: To track changes over time.
o Scatter Plot: To find relationships between variables.
Program

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

Step 1: Create sample streaming stock data (Simulated)

np.random.seed(1)

time = pd.date_range(start='2023-01-01', periods=50, freq='D')

stock_prices = np.cumsum(np.random.randn(50) * 2 + 100) # Simulated stock prices

temperature = np.random.rand(50) * 15 + 20 # Simulated temperature in Celsius

Step 2: Create DataFrame

data = pd.DataFrame({'Date': time, 'Stock Price': stock_prices, 'Temperature': temperature})

Step 3: Plot stock price as line plot

plt.figure(figsize=(10, 5))

plt.plot(data['Date'], data['Stock Price'], color='blue', label='Stock Price')

plt.title('Stock Market Trend (Streaming Data)')

plt.xlabel('Date')

plt.ylabel('Stock Price')

plt.xticks(rotation=45)

plt.legend()

plt.show()

Step 4: Plot temperature vs stock price as scatter plot

plt.figure(figsize=(8, 5))

plt.scatter(data['Stock Price'], data['Temperature'], color='red', label='Temperature vs Stock Price')

plt.title('Temperature vs Stock Price')

plt.xlabel('Stock Price')
plt.ylabel('Temperature')

plt.legend()

plt.grid(True)

plt.show()

Output:

👉 Stock Price Line Plot:

 X-Axis: Date
 Y-Axis: Stock Price
 Trend: Blue line showing price changes over time

👉 Temperature vs Stock Price Scatter Plot:

 X-Axis: Stock Price


 Y-Axis: Temperature
 Color: Red data points

Experiment 7: Market Basket Data Analysis and Visualization

Aim:

To analyze and visualize market basket data using association rules and itemset frequency.

Explanation:

1. Market Basket Analysis:


o Technique used to identify patterns in customer purchases.
o Finds combinations of products frequently bought together.

2. Apriori Algorithm:
o Identifies frequent itemsets based on minimum support and confidence.
o Support → Frequency of item combinations.
o Confidence → Likelihood of purchasing items together.

3. Steps Involved:
o Create sample transaction data.
o Use Apriori algorithm to find frequent itemsets.
o Visualize the association rules using a heatmap or graph.

Program:
import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules
import seaborn as sns
import matplotlib.pyplot as plt

Step 1: Create sample transaction data


data = {
'Milk': [1, 0, 1, 1, 0, 1],
'Bread': [1, 1, 1, 0, 1, 1],
'Butter': [0, 1, 1, 1, 1, 0],
'Eggs': [1, 1, 0, 0, 1, 1],
'Cheese': [0, 0, 1, 1, 0, 1]
}
df = pd.DataFrame(data)

Step 2: Find frequent itemsets using Apriori


frequent_itemsets = apriori(df, min_support=0.3, use_colnames=True)

Step 3: Generate association rules


rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1)

Step 4: Visualize association rules


plt.figure(figsize=(8, 5))
sns.heatmap(rules[['support', 'confidence', 'lift']], cmap='coolwarm', annot=True)
plt.title('Association Rules Heatmap')
plt.show()

Output:

👉 Heatmap:

 X-Axis and Y-Axis: Support, Confidence, Lift


 Color: Strength of the relationship between items
 Interpretation: Higher values indicate stronger associations

Experiment 8: Text Visualization Using Web Analytics

Aim:

To visualize textual data from web analytics using word clouds and frequency plots.

Explanation:
1. Textual Data:
o Data collected from web logs, search queries, and user comments.
o Often unstructured and needs preprocessing.

2. Text Visualization:
o Word cloud → Displays the most frequent words in a dataset.
o Frequency plot → Shows the top words and their counts.

3. Steps Involved:
o Create sample web log data.
o Preprocess text (remove stopwords, punctuation).
o Generate a word cloud and frequency plot.

Program:
from wordcloud import WordCloud
import matplotlib.pyplot as plt
from collections import Counter
import seaborn as sns

Step 1: Create sample text data from web logs


text_data = """
User clicked on product page, user viewed details, user added to cart, user searched for mobile phones,
user purchased item, user viewed related items, user clicked on ads, user left feedback, user searched
for tablets,
user returned item, user added item to wishlist, user rated product.
"""

Step 2: Preprocess text data


words = text_data.replace(',', '').replace('.', '').lower().split()
word_counts = Counter(words)

Step 3: Generate Word Cloud


wordcloud = WordCloud(width=800, height=400,
background_color='white').generate_from_frequencies(word_counts)

Step 4: Display Word Cloud


plt.figure(figsize=(10, 5))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.title('Word Cloud from Web Logs')
plt.show()

Step 5: Generate Frequency Plot


plt.figure(figsize=(8, 5))
sns.barplot(x=list(word_counts.keys())[:10], y=list(word_counts.values())[:10], palette='viridis')
plt.title('Top 10 Most Frequent Words in Web Logs')
plt.xlabel('Words')
plt.ylabel('Frequency')
plt.show()

Output:

👉 Word Cloud:

 Displays the most common terms from the data.


 Size of the word = Frequency of occurrence.

👉 Frequency Plot:

 Top 10 most frequent terms.


 X-Axis → Words
 Y-Axis → Frequency

You might also like