Da Unit 1
Da Unit 1
Data Pivot table and Heat Map: Highlight Table, Bar Chart, Line Chart,
Visualization/Graph Area Chart, Pie Chart, Scatter Plot, Word Cloud, Tree Map, Blended
Axis, Dual Axis Advance Data Visualizations: Bar Chart, Line Chart, Dual Axis Chart, Other
Advanced Chart
Building View Explain latitude and longitude: Default location/Edit locations,
Advance Map Symbol map & Filled Map, Map Layer, Image in map, Map option
Option
Data Preparation Connecting to different Data Source: Excel, CSV, SQL Server ; Live vs Extract Connection:
Creating Extrac, Refreshing Extract, Increment Extract, Refreshing Live, Data Source
Editor, Pivoting and splitting, Data Interpreter: Clean Dirty Data, TWB vs TWBX, How to
create a packaged workbook, Difference between .tde and .hyper file
Advanced Data Joins: Inner, Left, Right, Outer Complex Join; Referential Integrity; Union; Data Blending
Preparation and when required; Cross DB Join
What is Tableau?
• Definition: Tableau is a leading data visualization and business intelligence (BI) tool used for
converting raw data into interactive, understandable, and shareable visualizations.
• Purpose: Helps organizations analyze large datasets, identify trends, and make data-driven
decisions quickly.
• Key Strength: Drag-and-drop interface with minimal coding required.
Features of Tableau
• Connects to multiple data sources (Excel, CSV, SQL, cloud platforms).
• Provides interactive dashboards.
• Supports real-time (live) and offline (extract) connections.
• Wide range of charts: Bar, Line, Pie, Heatmap, Tree map, Scatter, Maps.
• Easy sharing through Tableau Server, Tableau Online, or Tableau Public.
• Advanced options: forecasting, clustering, trend lines, calculated fields.
• Drag-and-drop interface for ease of use.
• Real-time data analysis and updates.
• Interactive dashboards and story points.
• Data blending and joining.
• Extensive data connectivity.
• Advanced visual analytics (forecasting, clustering).
• Sharing and collaboration tools.
•
tableau Architecture
1. Data Sources
• Tableau can connect to multiple data sources such as:
o Relational databases: SQL Server, MySQL, Oracle, Teradata, IBM DB2, SAP,
Sybase.
o Cloud & Big Data: Google BigQuery, Amazon Redshift, Hadoop, etc.
o Files: Excel, CSV, PDFs.
o APIs/Other: Twitter, Microsoft Dynamics, etc.
Connection Types:
• Live → Direct connection to source (real-time updates).
• Extract → Data snapshot stored in .tde or .hyper for fast offline analysis.
5. Users
• Analysts (Creators) → Build and publish reports.
• Business Users (Viewers) → Interact with reports (filter, drill-down).
• Executives (Consumers) → View dashboards for decision-making.
Tableau Interface
The Tableau interface includes the following main elements:
1. Menu Bar and Toolbar
• Located at the top of the window.
• Provides access to commands and quick tools such as Save, Undo/Redo, Sort, Add Trend
Lines, Export, and Connect to Data.
2. Data Pane
• Found on the left-hand side.
• Displays all connected data sources and fields.
• Contains:
o Dimensions → Qualitative fields (e.g., Country, Category).
o Measures → Quantitative fields (e.g., Sales, Profit).
• Fields from the Data Pane are dragged into the workspace or shelves.
3. Workspace (Canvas)
• The central area used for creating visualizations.
• Drag-and-drop fields to build charts, dashboards, and stories.
• Works as the main design space.
4. Shelves
• Located above and around the workspace.
• Used to place fields for structuring visualizations.
• Types of shelves:
o Columns Shelf → Places fields horizontally.
o Rows Shelf → Places fields vertically.
o Filters Shelf → Restricts what data appears in the view.
o Pages Shelf → Breaks visualization into multiple pages (e.g., time-based
animations).
• Marks Card → Controls how data appears (color, size, label, detail, shape).
5. Show Me Panel
• Suggests the best visualization type based on selected fields.
• Example: Selecting one measure and one dimension → Suggests Bar/Line Chart.
6. Sheet Tabs
• Located at the bottom of the interface.
• Help in navigating between:
o Worksheets → Individual visualizations.
o Dashboards → Combination of multiple worksheets.
o Stories → Sequential combination of dashboards for presentations.
Exam-Ready Summary
• Menu Bar/Toolbar → Commands & tools.
• Data Pane → Dimensions & Measures from data sources.
• Workspace → Canvas for building visualizations.
• Shelves → Columns, Rows, Filters, Pages, Marks.
• Show Me Panel → Suggests chart types.
• Sheet Tabs → Switch between Worksheets, Dashboards, Stories.
2. Heat Map
• Definition: A chart that uses color intensity to represent values in a matrix.
• Key Points:
o Darker/brighter colors = higher/lower values.
o Helps identify patterns and outliers.
• Example: Product vs Region Sales → High sales = dark green, low sales = light green.
3. Highlight Table
• Definition: A type of table where values are highlighted using colors.
• Key Points:
o Similar to Heat Map but applied to tabular data.
o Useful for quick comparisons.
• Example: Monthly Profit by Product with cells shaded in red (low) and green (high).
4. Bar Chart
• Definition: Represents categorical data using rectangular bars proportional to values.
• Key Points:
o Can be vertical or horizontal.
o Easy comparison between categories.
• Example: Sales by Product Category (Furniture vs Office Supplies vs Technology).
5. Line Chart
• Definition: Displays data points connected by a line.
• Key Points:
o Best for time-series trends.
o Shows increase/decrease over time.
• Example: Monthly Sales trend from Jan to Dec.
6. Area Chart
• Definition: Similar to a line chart, but the area under the line is filled with color.
• Key Points:
o Highlights magnitude of change.
o Useful for comparing cumulative values.
• Example: Revenue vs Expenses over time, showing the gap filled in different colors.
7. Pie Chart
• Definition: A circular chart divided into slices representing parts of a whole.
• Key Points:
o Shows percentage contribution of categories.
o Should be used with fewer categories (3–6).
• Example: Market share of companies (Apple, Samsung, Others).
8. Scatter Plot
• Definition: A chart with points plotted on an X-Y axis to show relationships.
• Key Points:
o Useful for correlation analysis.
o Can add trend lines.
• Example: Advertising Spend (X) vs Sales Revenue (Y).
9. Word Cloud
• Definition: A visualization where word size represents frequency or importance.
• Key Points:
o Larger words = higher frequency.
o Often used in text analytics.
• Example: Analyzing customer reviews → words like “Good,” “Fast,” “Expensive” sized by
frequency.
13. Histogram
• Definition: Displays the frequency distribution of a continuous measure by dividing values
into bins.
• Key Points: Shows how often values fall within ranges.
• Example: Distribution of Order Quantities → Most orders fall in the 5–10 items range.
2. Blended Axis
• Definition: Places multiple measures on the same axis for direct comparison.
• Purpose: Makes charts simpler by avoiding multiple axes.
• Example: Displaying Sales and Profit on a single Y-axis in a bar chart.
3. Histogram
• Definition: A chart that shows the frequency distribution of a measure by grouping values
into bins.
• Purpose: Analyze data distribution.
• Example: Distribution of Order Quantities (how many orders fall in the 1–5, 6–10, etc.
ranges).
4. Gantt Chart
• Definition: Horizontal bar chart showing task duration, start and end dates.
• Purpose: Project management & scheduling.
• Example: Project Plan showing each task timeline.
5. Bullet Chart
• Definition: Variation of bar chart that compares a measure against a target with
background performance bands.
• Purpose: Track progress against goals.
• Example: Actual Sales vs Target Sales, with ranges (Poor–Good–Excellent).
6. Tree Map
• Definition: Uses nested rectangles sized and colored by values.
• Purpose: Visualize hierarchical data in compact form.
• Example: Sales by Category → Subcategory.
7. Word Cloud
• Definition: Visualizes text frequency; word size = frequency or importance.
• Purpose: Text analytics and customer feedback.
• Example: Customer reviews → “Good”, “Expensive”, “Fast” sized by occurrence.
data connection
A data connection is essentially a link between your Tableau workbook and the data you want to
analyze. Following types of data sources can be connected to the Tableau:
File System: For example, Microsoft Excel, CSV, etc.
Cloud System: For example, Google Big Query, Microsoft Azure, etc.
Relational System: For example, Microsoft SQL Server, DB2, Oracle, etc.
Other Sources: For example, ODBC.
Exam-Ready Summary
• Tableau supports File-based, Cloud, Relational, and ODBC connections.
• CSV/Text File → Quick flat-file connection.
• Excel File → Multiple sheets, easy drag-and-drop.
• SQL Server/MySQL → Requires credentials, supports queries, ideal for large datasets.
Live vs Extract Connection in Tableau
1. Live Connection
• Definition: A direct link to the data source. Tableau queries the source in real time.
• Advantages:
o Real-Time Updates – Always fetches the latest data.
o No Data Duplication – Data stays in the source.
o Less Local Storage – Does not require saving data on your machine.
• Disadvantages:
o Performance Dependency – Slow database = slow dashboards.
o Requires Internet/Server Access – No offline analysis.
o Security Concerns – Directly connected to source, can raise access issues.
• Example: Stock market dashboard connected live to a financial database for real-time
prices.
2. Extract Connection
• Definition: A snapshot of the data saved locally in Tableau’s optimized format (.tde or
.hyper).
• Advantages:
o Faster Performance – Queries run on local optimized file.
o Offline Access – Can be used without internet or server connection.
o Data Transformation – Filtering, aggregation, and custom datasets possible during
extract creation.
• Disadvantages:
o Data Staleness – Snapshot may become outdated. Needs refreshing.
o Storage Requirement – Takes up local disk space.
o Data Duplication – Creates an extra copy of the source data.
• Example: Monthly sales report stored as extract for quick offline analysis.
3. Creating Extracts
• Steps:
1. Connect to data source.
2. In Data Menu, select Extract Data.
3. Apply filters/aggregations if required.
4. Save extract as .hyper file.
4. Refreshing Extracts
• Two options:
1. Refresh like a live source (updates when workbook opens).
2. Refresh Extract → Creates a new snapshot from original data.
• Incremental Refresh: Only new rows added since last refresh are appended (saves time vs
full refresh).
Exam-Ready Summary
• Live Connection → Real-time, no duplication, requires source connection, slower if DB is
slow.
• Extract Connection → Snapshot (.hyper), faster, offline, but can get outdated.
• Creating Extract → Save filtered/aggregated snapshot.
• Refreshing Extract → Can be full or incremental.
• Refreshing Live → Queries DB every time → always latest data.
• Data Source Editor → Manage, clean, and join data.
• Pivoting → Convert columns → rows.
• Splitting → Divide a field into multiple fields.
Exam-Ready Summary
• Data Interpreter → Cleans messy Excel/CSV/PDF data → identifies headers, sub-tables,
removes junk rows.
• TWB → XML file, contains visualization instructions, no data included, links to original
source.
• TWBX → Packaged file, includes workbook + data + images, ideal for sharing.
• Creating Packaged Workbook → File → Save As → Choose .twbx.
3. Comparison Table
Feature .tde (Tableau Data Extract) .hyper (Hyper Extract)
Introduced Older extract format Tableau 10.5 onwards
Data Size Medium datasets Large, complex datasets
Performance Good, but limited for huge Very high, optimized for big data
data
Technology Single-threaded, older engine Multi-threaded, new Hyper engine
Use Case Smaller extracts, legacy Modern extracts, enterprise-level
workbooks analysis
b) Left Join
• Returns all records from left table, plus matching from right.
• Non-matches → NULL.
• Example: Orders + Customers → All orders included, even if customer info missing.
c) Right Join
• Opposite of Left Join → all records from right table + matching from left.
• Non-matches → NULL.
• Example: Customers + Orders → All customers shown, even those who never ordered.
2. Complex Joins
• Joins involving multiple conditions or tables.
• Example: Orders + Customers + Products → to know which customer bought which
product.
• Tableau supports up to 32 tables in a join.
3. Referential Integrity
• Definition: Ensures foreign keys in one table always exist in another (primary key).
• Use in Tableau:
o Option: Assume Referential Integrity.
o Queries only necessary tables → improves performance.
• Example: Employee Table (DeptID) + Department Table (DeptID).
4. Union
• Definition: Appends rows from multiple tables with identical structure.
• Example: Sales_Q1, Sales_Q2, Sales_Q3, Sales_Q4 → Combined into one Sales table.
• Before Union: Separate sheets (Q1, Q2…).
• After Union: Single table with all records stacked.
5. Data Blending
• Definition: Combines data from different sources at visualization level (not in DB).
• Works via Primary Source (blue) + Secondary Source (orange).
• Needs a common linking field (e.g., Date, Region, ID).
• When Used:
o Different databases (Excel + SQL).
o When join is not possible.
• Example: Blend Excel Sales with SQL Targets on Region field.
6. Cross-Database Join
• Definition: Directly joins tables from different databases inside Tableau.
• Example: MySQL → Orders table joined with Oracle → Inventory table on ProductID.
• Difference from Blending: Happens at database/query level (not viz level).
1. Calculated Field
• Definition: A Calculated Field is a custom field created in Tableau by applying formulas or
expressions on existing fields.
• Purpose: Helps create new measures or dimensions not directly present in the data source.
• How to Create:
o Menu → Analysis → Create Calculated Field.
• Examples:
o Profit Ratio = [Profit] / [Sales]
o Full Name = [First Name] + " " + [Last Name]
Use Case: To calculate Profit Margin % when only Sales and Profit are available.
2. Measure Values
• Definition: Measure Values is a special field in Tableau that represents all the numerical
fields (measures) in the dataset.
• Purpose: Allows displaying multiple measures in a single view.
• How it works:
o When you drag “Measure Values” to Rows/Columns, Tableau plots all selected
measures together.
• Examples:
o Displaying Sales, Profit, Discount in one table using Measure Values.
Use Case: To show different KPIs (Sales, Profit, Quantity) side by side in one visualization.
3. Measure Names
• Definition: Measure Names is another special field that contains the names of all measures
in the dataset.
• Purpose: Works together with Measure Values to filter/select which measures to display.
• How it works:
o Drag “Measure Names” to Rows/Columns or Filters → Choose which measures to
show.
• Examples:
o Select only “Sales” and “Profit” to appear, while excluding “Quantity” and
“Discount”.
Use Case: To dynamically switch between measures in dashboards (e.g., toggle between Sales
and Profit).
Summary Table
Concept Definition Example Use Case
Custom field Create new KPIs
Calculated [Profit]/[Sales] = Profit
created using like Profit Margin
Field Ratio
formulas %
Display multiple
Measure Holds all numeric Sales, Profit, Discount
measures in one
Values measures in dataset shown together
chart
Measure Holds the names of “Sales”, “Profit”, Filter which
Names all measures “Quantity” measures to display