0% found this document useful (0 votes)
20 views8 pages

Visualization

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views8 pages

Visualization

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Data mining knowledge representation: Visualization techniques.

In data mining, knowledge representation refers to how the discovered


patterns, associations, or insights are presented to users in a way that is
comprehensible and actionable. Since mined knowledge is often complex
(rules, clusters, decision trees, networks, etc.),
visualization techniques are widely used to represent it.

1. What is Visualization?
 Visualization is the process of representing data graphically so that
patterns, trends, and insights become easier to understand.
 In data mining / data analysis, visualization is especially important
because datasets are often large, complex, and multi-dimensional.

2. Motivation for Visualization


Why do we use visualization in data mining and analytics?
 Humans process visual information faster than raw numbers.
 To discover hidden patterns and relationships in large datasets.
 To make comparisons (e.g., sales across regions).
 To communicate results effectively to non-technical users.
 To deal with high-dimensional data, where numerical inspection is
impossible.

3. General Concepts of Visualization Techniques


Visualization techniques can be classified based on:
1. Data Type:
o One-dimensional (histogram, line chart)
o Two-dimensional (scatter plot, heatmap)
o Multi-dimensional (parallel coordinates, radar charts, projections)

2. Representation Style:
o Geometric (scatter plots, projections)
o Icon-based (glyphs, Chernoff faces)
o Pixel-oriented (each value mapped to a pixel, used for large data)
o Hierarchical (tree maps, dendrograms)
o Graph-based (network visualization)
o
3. Techniques of Data Visualization
Common Visualization Techniques
These techniques can be broadly categorized into a few main types:
1. Charts and Graphs:
Foundational tools for comparing data, showing trends, and
understanding distributions.
 Bar Charts: Ideal for comparing quantities across different

categories.
 Line Charts: Excellent for illustrating trends and changes over
time.
 Pie Charts: Used to show the proportion of a whole.
 Scatter Plots: Help in analyzing the relationship between two
variables.
 Histograms: Display the distribution of a dataset.
 Box Plots: Visualize data distribution, identifying medians,
quartiles, and outliers.

2. Maps:
Used to represent geographical data and highlight spatial
relationships.
 Heatmaps: Visualize the density of data points in a geographical
area or across categories.
 Choropleth Maps: Use colors to represent statistical data over
geographical
areas.
3. Hierarchical and Part-to-Whole Visualizations:
Useful for data with nested or layered structures.
 Tree Maps: Represent hierarchical data using nested rectangles,
with the size of each rectangle proportional to its value.
 Sunburst Charts: Show hierarchical data in a radial layout.

4. Text-Based and Analytical Visualizations:


For understanding textual data and complex relationships.
 Word Clouds: Visually represent textual data by showing the
frequency of words, where larger words appear more frequently.
 Network Diagrams: Illustrate connections and relationships
between different entities.

Decision Trees: Visualize the decision-making process by representing a


set of rules.

Example: Retail
Store Sales Data
Analysis

Imagine you are analyzing a retail store chain’s sales dataset.


The dataset has attributes like:
 Date (daily/weekly)
 Region (North, South, East, West)
 Product Category (Electronics, Clothing, Groceries, etc.)
 Units Sold
 Revenue
 Customer Age Group

1️Line Chart → Revenue trend over months


2️Bar Chart → Total revenue by product category
3️Scatter Plot → Relationship between units sold and revenue
4️Heatmap → Revenue comparison across regions and categories

Line Chart( Revenue trend over months)

Bar Chart (Total revenue by product category)


Scatter Plot→ Relationship between units sold and revenue

Heatmap→ Revenue comparison across regions and categories

Categories of Techniques Based on Approach


These techniques are categorized by how they process and display
data:

 Data visualization techniques can be broadly classified into five main


categories, depending on the type and complexity of data to be
represented:
 These techniques are categorized by how they process and display
data:

 1. Pixel-Oriented Visualization Techniques: : Map each data value to


a single pixel, using color to represent the value.

 Idea: Each data value is represented as a single colored pixel.


 Use case: Very large datasets (millions of records).
 Examples:
o Recursive Pattern Technique
o Circle Segments Technique
Practical Techniques :Heatmaps, pixel maps
 Advantage: Can display a huge amount of data in a compact space.
 Limitation: Interpretation may be difficult without interaction.

 2. Geometric Projection Techniques: Use mathematical formulas to


project high-dimensional data into a lower-dimensional space, such
as scatter plot matrices.

 Idea: Map multi-dimensional data into 2D or 3D geometric spaces.


 Use case: To explore relationships and clusters in high-dimensional data.
 Examples:
o Scatter plots
o Principal Component Analysis (PCA) plots
o t-SNE and MDS visualizations
 Advantage: Helps to find clusters, correlations, and patterns.
 Limitation: May lose information during dimensionality reduction.

 3. Icon-Based Visualization Techniques: Use icons to represent multi-


dimensional data values.

 Idea: Represent each data object as a small icon/glyph, where different


attributes are encoded as visual features.
 Use case: Multivariate data analysis.
 Examples:
o Chernoff Faces (human face features mapped to variables)
o Star Plots / Radial Graphs
 Advantage: Human brain can easily detect similarities/differences in
icons.
 Limitation: Becomes confusing with too many variables or large
datasets.

4. Hierarchical Visualization Techniques :Organize data into a tree


structure, with nodes representing data elements and branches showing
their relationships.
 Idea: Used for data with hierarchical or tree-like structures.
 Use case: Taxonomies, organizational structures, decision processes.
 Examples:
o Tree Maps(Decision trees)
o Dendrograms
o Cone Trees
 Advantage: Good for showing nested relationships.
 Limitation: Hard to read when hierarchy is very deep.

5. Graph-Based Visualization Techniques


 Idea: Represent data objects as nodes and their relationships as edges.
 Use case: Social networks, web link analysis, knowledge graphs.
 Examples:
o Force-directed Graphs
o Association Graphs
 Advantage: Clearly shows relationships and connectivity.
 Limitation: May become cluttered for very dense graphs.

4. Visualizing Higher-Dimensional Data


Real-world datasets often have more than 3 attributes (dimensions), but our
eyes can only perceive 2D/3D.
Techniques to visualize high-dimensional data include:
 Parallel Coordinates: Each attribute is an axis; lines represent data
points.
 Dimensionality Reduction:
o PCA (Principal Component Analysis)
o t-SNE (t-distributed stochastic neighbor embedding)
o UMAP (Uniform Manifold Approximation and Projection)
 Star/Radar Plots: Each axis is an attribute; data represented as polygons.
 Glyphs & Icons: Represent attributes with shapes, colors, or faces.
 Pixel-Oriented Techniques: Map values to pixel colors across the
screen.

5. Do’s and Don’ts of Visualization


✅ Do’s:
 Choose the right visualization for the data type and audience.
 Keep it simple and clear, avoid unnecessary decoration.
 Use consistent color schemes (avoid confusion).
 Label axes, legends, and units properly.
 Highlight important insights (not everything equally).
 Use interactive visualization if the dataset is huge.
❌ Don’ts:
 Don’t overload with too much information in one chart (avoid clutter).
 Don’t use misleading scales (e.g., truncated axis that exaggerates results).
 Don’t rely only on color (consider color-blind viewers).
 Don’t use 3D plots unnecessarily (often confusing unless needed).
 Don’t ignore context (data without explanation is meaningless).

You might also like