0% found this document useful (0 votes)
43 views17 pages

Da Unit 1

its the notes of data visualization tableau unit 1

Uploaded by

misha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views17 pages

Da Unit 1

its the notes of data visualization tableau unit 1

Uploaded by

misha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Unit-1 Contact Hours:10

Introduction to What is Tableau: Architecture of Tableau, Features of Tableau, Interface of Tableau


Tableau (Layout, Toolbars, Data pane, Analytics pane etc), Introduction to the various file type

Data Pivot table and Heat Map: Highlight Table, Bar Chart, Line Chart,
Visualization/Graph Area Chart, Pie Chart, Scatter Plot, Word Cloud, Tree Map, Blended
Axis, Dual Axis Advance Data Visualizations: Bar Chart, Line Chart, Dual Axis Chart, Other
Advanced Chart
Building View Explain latitude and longitude: Default location/Edit locations,
Advance Map Symbol map & Filled Map, Map Layer, Image in map, Map option
Option
Data Preparation Connecting to different Data Source: Excel, CSV, SQL Server ; Live vs Extract Connection:
Creating Extrac, Refreshing Extract, Increment Extract, Refreshing Live, Data Source
Editor, Pivoting and splitting, Data Interpreter: Clean Dirty Data, TWB vs TWBX, How to
create a packaged workbook, Difference between .tde and .hyper file
Advanced Data Joins: Inner, Left, Right, Outer Complex Join; Referential Integrity; Union; Data Blending
Preparation and when required; Cross DB Join

What is Tableau?
• Definition: Tableau is a leading data visualization and business intelligence (BI) tool used for
converting raw data into interactive, understandable, and shareable visualizations.
• Purpose: Helps organizations analyze large datasets, identify trends, and make data-driven
decisions quickly.
• Key Strength: Drag-and-drop interface with minimal coding required.
Features of Tableau
• Connects to multiple data sources (Excel, CSV, SQL, cloud platforms).
• Provides interactive dashboards.
• Supports real-time (live) and offline (extract) connections.
• Wide range of charts: Bar, Line, Pie, Heatmap, Tree map, Scatter, Maps.
• Easy sharing through Tableau Server, Tableau Online, or Tableau Public.
• Advanced options: forecasting, clustering, trend lines, calculated fields.
• Drag-and-drop interface for ease of use.
• Real-time data analysis and updates.
• Interactive dashboards and story points.
• Data blending and joining.
• Extensive data connectivity.
• Advanced visual analytics (forecasting, clustering).
• Sharing and collaboration tools.

tableau Architecture
1. Data Sources
• Tableau can connect to multiple data sources such as:
o Relational databases: SQL Server, MySQL, Oracle, Teradata, IBM DB2, SAP,
Sybase.
o Cloud & Big Data: Google BigQuery, Amazon Redshift, Hadoop, etc.
o Files: Excel, CSV, PDFs.
o APIs/Other: Twitter, Microsoft Dynamics, etc.
Connection Types:
• Live → Direct connection to source (real-time updates).
• Extract → Data snapshot stored in .tde or .hyper for fast offline analysis.

2. Tableau Desktop (Authoring Layer)


• Analysts work here to create dashboards and reports.
• They can:
o Connect to sources (live/extract).
o Build worksheets, charts, and dashboards.
o Publish results.

3. Data Engine & Storage


• Extracted data is stored in TDE/Hyper files.
• Key elements:
o Cache – improves performance.
o Repository – stores metadata, workbook details, user preferences.
o Security – manages user authentication & permissions.
o Automation – refresh schedules, data updates.

4. Publishing & Distribution Layer


• Once dashboards are created, they are published for sharing.
• Options:
1. Tableau Readers – Local view of workbooks (.twb/.twbx).
2. Static Readers – Export as PDF, Excel, PowerPoint (no drill-down or filtering).
3. Web & Mobile Users – Access via Tableau Server or Tableau Online (mass
consumption).

5. Users
• Analysts (Creators) → Build and publish reports.
• Business Users (Viewers) → Interact with reports (filter, drill-down).
• Executives (Consumers) → View dashboards for decision-making.

Summary (For Exams)


• Tableau architecture has 3 major parts:
1. Data Sources (Excel, SQL, Cloud, etc.)
2. Authoring & Data Engine Layer (Desktop → Live/Extract → Repository, Security,
Cache)
3. Users Layer (Reader, Static export, Server for web/mobile)
• Flow: Data → Extract/Live → Tableau Desktop → Publish → End Users.

Tableau Interface
The Tableau interface includes the following main elements:
1. Menu Bar and Toolbar
• Located at the top of the window.
• Provides access to commands and quick tools such as Save, Undo/Redo, Sort, Add Trend
Lines, Export, and Connect to Data.

2. Data Pane
• Found on the left-hand side.
• Displays all connected data sources and fields.
• Contains:
o Dimensions → Qualitative fields (e.g., Country, Category).
o Measures → Quantitative fields (e.g., Sales, Profit).
• Fields from the Data Pane are dragged into the workspace or shelves.

3. Workspace (Canvas)
• The central area used for creating visualizations.
• Drag-and-drop fields to build charts, dashboards, and stories.
• Works as the main design space.

4. Shelves
• Located above and around the workspace.
• Used to place fields for structuring visualizations.
• Types of shelves:
o Columns Shelf → Places fields horizontally.
o Rows Shelf → Places fields vertically.
o Filters Shelf → Restricts what data appears in the view.
o Pages Shelf → Breaks visualization into multiple pages (e.g., time-based
animations).
• Marks Card → Controls how data appears (color, size, label, detail, shape).

5. Show Me Panel
• Suggests the best visualization type based on selected fields.
• Example: Selecting one measure and one dimension → Suggests Bar/Line Chart.

6. Sheet Tabs
• Located at the bottom of the interface.
• Help in navigating between:
o Worksheets → Individual visualizations.
o Dashboards → Combination of multiple worksheets.
o Stories → Sequential combination of dashboards for presentations.
Exam-Ready Summary
• Menu Bar/Toolbar → Commands & tools.
• Data Pane → Dimensions & Measures from data sources.
• Workspace → Canvas for building visualizations.
• Shelves → Columns, Rows, Filters, Pages, Marks.
• Show Me Panel → Suggests chart types.
• Sheet Tabs → Switch between Worksheets, Dashboards, Stories.

Tableau – File Types


Tableau uses different file formats for workbooks, data extracts, packaged files, and
preferences. Each serves a specific purpose.

1. Workbook Files (.twb)


• TWB = Tableau Workbook
• Stores:
o Worksheets, Dashboards, Stories
o Connections to data sources
o Metadata (field types, calculated fields, formatting)
• Does not contain the actual data → only stores a link to the data source.
• Useful when sharing within the same environment (where everyone has access to the data
source).

2. Packaged Workbook Files (.twbx)


• TWBX = Tableau Packaged Workbook
• Contains:
o Everything in a .twb file + data extracts, images, custom calculations, and local
files.
• Like a zip file → self-contained.
• Useful for sharing with others who don’t have direct access to the data source.

3. Data Extract Files (.tde / .hyper)


• TDE (Tableau Data Extract) – older extract format.
• Hyper – newer extract format (faster, supports larger datasets).
• Contains a snapshot of data stored in a columnar format.
• Used for:
o Faster performance
o Offline analysis
o Incremental refreshes

4. Data Source Files (.tds / .tdsx)


• TDS (Tableau Data Source)
o Contains connection information, fields, hierarchies, groups, calculated fields.
o Does not store actual data.
• TDSX (Packaged Data Source)
o Same as .tds but also includes the extract (.tde/.hyper) and local files.
o Useful for sharing a prepared data source with colleagues.

5. Bookmark Files (.tbm)


• Stores a single worksheet as a “bookmark.”
• Allows you to reuse a specific view in other workbooks.

6. Preferences File (.tps)


• XML file that stores custom color palettes and preferences.
• Helps maintain a consistent visual style across projects.

Summary Table (Exam-Ready)


File Full Form Contains Data Use Case
Type Included?
.twb Tableau Sheets, No Linking to
Workbook dashboards, data sources
metadata
.twbx Tableau TWB + data, Yes Sharing with
Packaged images others
Workbook
.tde / Tableau Data Extracted Yes Faster,
.hyper Extract snapshot of offline
data analysis
.tds Tableau Data Connection No Reuse
Source details, connections
metadata
.tdsx Packaged TDS + extract Yes Sharing
Data Source & files prepared
data
.tbm Tableau Single No Reuse views
Bookmark worksheet
.tps Tableau Color palettes, No Custom
Preferences settings styles

Data Visualization / Graphs in Tableau


1. Pivot Table
• Definition: A table that summarizes and reorganizes data for analysis (like Excel Pivot
Table).
• Key Points:
o Groups and aggregates data by dimensions (e.g., Region, Category).
o Can show totals, averages, counts, percentages.
• Example: Summarizing Sales by Region and Category with totals for each.

2. Heat Map
• Definition: A chart that uses color intensity to represent values in a matrix.
• Key Points:
o Darker/brighter colors = higher/lower values.
o Helps identify patterns and outliers.
• Example: Product vs Region Sales → High sales = dark green, low sales = light green.

3. Highlight Table
• Definition: A type of table where values are highlighted using colors.
• Key Points:
o Similar to Heat Map but applied to tabular data.
o Useful for quick comparisons.
• Example: Monthly Profit by Product with cells shaded in red (low) and green (high).

4. Bar Chart
• Definition: Represents categorical data using rectangular bars proportional to values.
• Key Points:
o Can be vertical or horizontal.
o Easy comparison between categories.
• Example: Sales by Product Category (Furniture vs Office Supplies vs Technology).

5. Line Chart
• Definition: Displays data points connected by a line.
• Key Points:
o Best for time-series trends.
o Shows increase/decrease over time.
• Example: Monthly Sales trend from Jan to Dec.

6. Area Chart
• Definition: Similar to a line chart, but the area under the line is filled with color.
• Key Points:
o Highlights magnitude of change.
o Useful for comparing cumulative values.
• Example: Revenue vs Expenses over time, showing the gap filled in different colors.

7. Pie Chart
• Definition: A circular chart divided into slices representing parts of a whole.
• Key Points:
o Shows percentage contribution of categories.
o Should be used with fewer categories (3–6).
• Example: Market share of companies (Apple, Samsung, Others).

8. Scatter Plot
• Definition: A chart with points plotted on an X-Y axis to show relationships.
• Key Points:
o Useful for correlation analysis.
o Can add trend lines.
• Example: Advertising Spend (X) vs Sales Revenue (Y).

9. Word Cloud
• Definition: A visualization where word size represents frequency or importance.
• Key Points:
o Larger words = higher frequency.
o Often used in text analytics.
• Example: Analyzing customer reviews → words like “Good,” “Fast,” “Expensive” sized by
frequency.

10. Tree Map


• Definition: Uses nested rectangles sized and colored by values.
• Key Points:
o Good for hierarchical and comparative analysis.
o Space-efficient.
• Example: Sales by Category and Sub-Category → Larger rectangles = higher sales.

11. Blended Axis


• Definition: Combines multiple measures into a single axis for comparison.
• Key Points:
o Avoids overlapping multiple charts.
• Example: Sales and Profit shown together on one axis.
12. Dual Axis Chart
• Definition: Uses two axes (left and right) in a single visualization.
• Key Points:
o Helps compare two different measures with different scales.
o Can combine chart types (Bar + Line).
• Example: Sales (Bar) vs Profit Margin (Line) on the same chart.

Advanced Data Visualizations

13. Histogram
• Definition: Displays the frequency distribution of a continuous measure by dividing values
into bins.
• Key Points: Shows how often values fall within ranges.
• Example: Distribution of Order Quantities → Most orders fall in the 5–10 items range.

14. Gantt Chart


• Definition: Horizontal bar chart showing start and end times/duration of tasks.
• Key Points: Useful for project scheduling and timelines.
• Example: Project Plan → Each task bar shows duration from start to finish date.

15. Bullet Chart


• Definition: A variation of bar chart that compares a measure against a target.
• Key Points: Shows actual value, target marker, and qualitative ranges (good/average/poor).
• Example: Actual Sales (bar) vs Target Sales (marker) with background ranges.

13. Other Advanced Charts


• Bullet Chart → Measures performance against a target.
• Gantt Chart → Shows project timelines and schedules.
• Box-and-Whisker Plot → Displays data distribution (median, quartiles, outliers).
• Histogram → Shows frequency distribution of data values.
Example Uses:
• Bullet Chart → Compare actual sales vs sales target.
• Gantt Chart → Project schedule with start/end dates.
• Box Plot → Distribution of employee salaries.
• Histogram → Frequency of orders by order size.\

Tableau: Building Views & Advanced Map


Options
1. Building View in Tableau
• Definition: A view in Tableau is any visualization created from data (chart, table, or
dashboard).
• Process of Building a View:
1. Connect to Data – Load Excel, CSV, SQL, or any supported data source.
2. Drag Fields – Drag dimensions and measures from the Data Pane to the Rows and
Columns shelves.
3. Apply Filters – Place fields on Filters Shelf to refine data.
4. Use Marks Card – Change how marks look using Color, Size, Label, Detail,
Tooltip, and Shape.
5. Customize View – Add titles, legends, and interactivity.
6. Save as Worksheet or Dashboard – Combine multiple views in a dashboard for
business analysis.
Example: Creating a Sales by Region Bar Chart – drag Region (Dimension) to Columns,
Sales (Measure) to Rows → Tableau builds a bar chart automatically.

2. Advanced Map Options in Tableau


Maps in Tableau are powerful for geospatial analysis.
(a) Latitude and Longitude
• Definition: Every geographic role (Country, State, City, Postal Code) in Tableau is
internally assigned latitude (Y-axis) and longitude (X-axis).
• Default Location: Tableau automatically plots a location when a known geography is used
(e.g., "India" → places a mark on India’s centroid).
• Edit Locations: If Tableau cannot recognize a location, users can manually edit it by:
o Right-click → Edit Locations → Choose correct country/state/city.
o Helps fix spelling errors or ambiguous locations (e.g., “Paris” in USA vs France).
Example: If “Punjab” is not mapped correctly, you can edit the location and assign it to
“India → Punjab.”

(b) Symbol Map vs Filled Map


• Symbol Map:
o Displays data points as symbols (shapes or circles) at geographical positions.
o Size and color of the symbol can represent data values.
o Example: Number of Stores in each City shown as circles; larger circle = more
stores.
• Filled Map (Choropleth Map):
o Colors geographic areas (countries, states, districts) based on a measure.
o Example: Sales by State where darker color = higher sales.
Comparison:
• Use Symbol Map when you want to compare exact data points.
• Use Filled Map when analyzing data distribution over regions.

(c) Map Layers


• Tableau allows adding layers on maps for better context.
• Layers include:
o Country/State/City borders.
o Coastlines and water bodies.
o Highways, postal codes.
o Custom layers (boundaries or shapefiles).
Example: Adding State Borders on a filled map of India to distinguish performance
within each state.

(d) Image in Map (Background Images)


• Users can add custom background images (e.g., floor plans, building layouts, site maps).
• Requires mapping image coordinates to latitude & longitude (or X & Y coordinates).
• Useful in IoT, engineering, and retail analysis.
Example: Uploading a store floor plan and mapping customer footfall data to see
hotspots.

(e) Map Options


• The Map Options Pane allows customization such as:
o Show/Hide map search box.
o Display map scale (distance).
o Show map layers (country borders, names, water bodies).
o Control zoom, pan, and map style (normal, dark, light, satellite).
o Add annotations for specific locations.
Example: Activating the Map Scale to measure distances between two cities.

Summary (Exam-Ready Points)


1. Building View – Create visualization by dragging Dimensions/Measures → Customize with
Marks, Filters, and Shelves → Save as Worksheet/Dashboard.
2. Latitude & Longitude – Tableau uses geo-coordinates; edit locations for accuracy.
3. Symbol Map – Data shown as circles/shapes; good for exact data points.
4. Filled Map – Regions shaded by measure; good for distribution analysis.
5. Map Layers – Add borders, roads, postal codes for more detail.
6. Image in Map – Insert custom images like floor plans with mapped coordinates.
7. Map Options – Control scale, style, zoom, annotations.

Refer for more notes on this to ppt 1.1.3 onwards


1. What are the core components Advance Data Visualizations.
Advanced Data Visualizations in Tableau are techniques and charts that go beyond the basics (bar,
line, pie, scatter) to provide comparisons, dual measures, distributions, timelines, and performance
vs target analysis.
Core Components:
• Dual Axis Chart – compare two measures with two axes.
• Blended Axis – multiple measures on a single axis.
• Histogram – shows frequency distribution.
• Gantt Chart – displays task duration/timeline.
• Bullet Chart – compares actual vs target performance.
• Tree Map – shows hierarchical data with nested rectangles.
• Word Cloud – represents text frequency with word size.
Core Components of Advanced Data Visualizations in Tableau
1. Dual Axis Chart
• Definition: Combines two different measures in one chart using two axes (left & right).
• Purpose: Compare measures with different scales.
• Example: Sales (Bar) vs Profit Margin (Line) over time.

2. Blended Axis
• Definition: Places multiple measures on the same axis for direct comparison.
• Purpose: Makes charts simpler by avoiding multiple axes.
• Example: Displaying Sales and Profit on a single Y-axis in a bar chart.

3. Histogram
• Definition: A chart that shows the frequency distribution of a measure by grouping values
into bins.
• Purpose: Analyze data distribution.
• Example: Distribution of Order Quantities (how many orders fall in the 1–5, 6–10, etc.
ranges).

4. Gantt Chart
• Definition: Horizontal bar chart showing task duration, start and end dates.
• Purpose: Project management & scheduling.
• Example: Project Plan showing each task timeline.

5. Bullet Chart
• Definition: Variation of bar chart that compares a measure against a target with
background performance bands.
• Purpose: Track progress against goals.
• Example: Actual Sales vs Target Sales, with ranges (Poor–Good–Excellent).

6. Tree Map
• Definition: Uses nested rectangles sized and colored by values.
• Purpose: Visualize hierarchical data in compact form.
• Example: Sales by Category → Subcategory.

7. Word Cloud
• Definition: Visualizes text frequency; word size = frequency or importance.
• Purpose: Text analytics and customer feedback.
• Example: Customer reviews → “Good”, “Expensive”, “Fast” sized by occurrence.

data connection
A data connection is essentially a link between your Tableau workbook and the data you want to
analyze. Following types of data sources can be connected to the Tableau:
File System: For example, Microsoft Excel, CSV, etc.
Cloud System: For example, Google Big Query, Microsoft Azure, etc.
Relational System: For example, Microsoft SQL Server, DB2, Oracle, etc.
Other Sources: For example, ODBC.

A. Connection with Text File (CSV/TSV/TXT)


Steps:
1. Open Tableau Desktop.
2. Select Text File under Connect.
3. Browse and choose the file (e.g., sales.csv).
4. Tableau loads the file in the Data Source Page (fields visible in Data Pane).
5. You can start using the columns → Dimensions (text fields) or Measures (numeric fields).
Example: Connecting customers.csv to analyze Customer Age vs Purchases.

B. Connection with Excel File


Steps:
1. Open Tableau Desktop → Click Microsoft Excel.
2. Select .xlsx file.
3. Tableau shows Sheets (e.g., Orders, Returns).
4. Drag one or more sheets into the workspace.
5. Dimensions and Measures get populated automatically.
Example: Connecting SalesData.xlsx → Drag Orders sheet → Build Sales by Region Bar Chart.

C. Connection with Database (SQL/MySQL/SQL Server)


Steps:
1. Open Tableau Desktop → Choose Database (e.g., MySQL, SQL Server).
2. A dialog box appears:
o Enter Server name & Port (Default: 3306 for MySQL).
o Provide Username & Password.
o Optionally enable SSL or run Initial SQL commands.
3. Select the required database/schema.
4. Choose tables or write custom SQL queries.
Example: Connect to SQL Server → Database: RetailDB → Run SQL:
SELECT Region, SUM(Sales) AS TotalSales
FROM Orders
GROUP BY Region;
→ Tableau builds a Regional Sales Chart.

Exam-Ready Summary
• Tableau supports File-based, Cloud, Relational, and ODBC connections.
• CSV/Text File → Quick flat-file connection.
• Excel File → Multiple sheets, easy drag-and-drop.
• SQL Server/MySQL → Requires credentials, supports queries, ideal for large datasets.
Live vs Extract Connection in Tableau

1. Live Connection
• Definition: A direct link to the data source. Tableau queries the source in real time.
• Advantages:
o Real-Time Updates – Always fetches the latest data.
o No Data Duplication – Data stays in the source.
o Less Local Storage – Does not require saving data on your machine.
• Disadvantages:
o Performance Dependency – Slow database = slow dashboards.
o Requires Internet/Server Access – No offline analysis.
o Security Concerns – Directly connected to source, can raise access issues.
• Example: Stock market dashboard connected live to a financial database for real-time
prices.

2. Extract Connection
• Definition: A snapshot of the data saved locally in Tableau’s optimized format (.tde or
.hyper).
• Advantages:
o Faster Performance – Queries run on local optimized file.
o Offline Access – Can be used without internet or server connection.
o Data Transformation – Filtering, aggregation, and custom datasets possible during
extract creation.
• Disadvantages:
o Data Staleness – Snapshot may become outdated. Needs refreshing.
o Storage Requirement – Takes up local disk space.
o Data Duplication – Creates an extra copy of the source data.
• Example: Monthly sales report stored as extract for quick offline analysis.

3. Creating Extracts
• Steps:
1. Connect to data source.
2. In Data Menu, select Extract Data.
3. Apply filters/aggregations if required.
4. Save extract as .hyper file.

4. Refreshing Extracts
• Two options:
1. Refresh like a live source (updates when workbook opens).
2. Refresh Extract → Creates a new snapshot from original data.
• Incremental Refresh: Only new rows added since last refresh are appended (saves time vs
full refresh).

5. Refreshing Live Data Source


• Every time the workbook opens or you press Refresh, Tableau queries the connected
database.
• Data is always the latest but performance depends on the database speed.

6. Data Source Editor


• Tool for managing and cleaning data connections.
• Functions:
o Rename fields, hide unused columns.
o Define joins/relationships.
o Manage extract filters.
o Preview data tables before visualization.

7. Pivoting and Splitting


• Pivoting: Converts columns into rows to restructure data.
o Example: Pivot "Sales_Q1, Sales_Q2, Sales_Q3, Sales_Q4" into a single column
"Quarter" with values.
• Splitting: Breaks a single field into multiple fields based on delimiter.
o Example: Split FullName = "John Smith" into First Name = John and Last Name
= Smith.

Exam-Ready Summary
• Live Connection → Real-time, no duplication, requires source connection, slower if DB is
slow.
• Extract Connection → Snapshot (.hyper), faster, offline, but can get outdated.
• Creating Extract → Save filtered/aggregated snapshot.
• Refreshing Extract → Can be full or incremental.
• Refreshing Live → Queries DB every time → always latest data.
• Data Source Editor → Manage, clean, and join data.
• Pivoting → Convert columns → rows.
• Splitting → Divide a field into multiple fields.

Data Interpreter, TWB vs TWBX, Packaged


Workbook
1. Data Interpreter: Cleaning Dirty Data
• Definition: Data Interpreter in Tableau is a built-in tool that helps clean messy/dirty data
from Excel, CSV, PDF, or Google Sheets before analysis.
• How it Works:
1. Connect Tableau to an Excel/CSV/PDF/Google Sheet.
2. On the Data Source Page, select Use Data Interpreter.
3. Tableau automatically detects headers, tables, and sub-tables.
4. It removes unnecessary rows (titles, notes, blank spaces).
5. You can review results in Excel via a generated “Data Interpreter Results” file.
6. If incorrect, you can uncheck the option or manually edit found tables.
Example: If an Excel file has company logos, merged headers, or notes at the top, Data
Interpreter will clean these and correctly identify the data table (columns = fields, rows = records).

2. Tableau File Types: TWB vs TWBX


TWB (Tableau Workbook File)
• Format: XML document.
• Contains:
o Information about sheets, dashboards, and stories.
o References (links) to external data sources (Excel, SQL, TDE, etc.).
• Does NOT contain actual data.
• Best Use: For work within the same environment where others have access to the original
data source.

TWBX (Tableau Packaged Workbook File)


• Format: A compressed “package” containing TWB + data + images + other resources.
• Contains actual data (in extract form) → so it does not need access to original data source.
• Can be shared easily (like a zipped bundle).
• Can include .tde or .hyper extracts for performance.
• Best Use: For sharing reports with others who may not have access to the original source.

Comparison (Exam-Ready Table)


Feature TWB TWBX
Format XML file Packaged (zip-like)
Data No Yes
Included?
Size Small (only metadata) Larger (includes data &
images)
Sharing Needs access to original source Can be shared standalone
Best Use Internal analysis with shared data Sharing with external users
source

3. How to Create a Packaged Workbook in Tableau


Steps:
1. Open the Tableau workbook (.twb).
2. Go to File → Save As.
3. Choose Tableau Packaged Workbook (.twbx).
4. Tableau compresses workbook + data extracts + images into a single file.
5. Share .twbx → It can be opened in Tableau Desktop or Tableau Reader.
Example: Creating a .twbx file for your professor so they can open your dashboard without
needing your original Excel or SQL database.

Exam-Ready Summary
• Data Interpreter → Cleans messy Excel/CSV/PDF data → identifies headers, sub-tables,
removes junk rows.
• TWB → XML file, contains visualization instructions, no data included, links to original
source.
• TWBX → Packaged file, includes workbook + data + images, ideal for sharing.
• Creating Packaged Workbook → File → Save As → Choose .twbx.

Difference between .tde and .hyper File


1. .TDE (Tableau Data Extract)
• Introduced earlier as Tableau’s extract format.
• Stores data in a compressed, columnar format.
• Good for medium-sized datasets.
• Provides faster queries than live connections but limited scalability.
2. .Hyper (Tableau Hyper Extract)
• Introduced in Tableau 10.5 as the new standard.
• Designed for large, complex datasets.
• Optimized for analytical query processing and parallel execution.
• Handles billions of rows with high performance.
• Supports advanced features like multi-threading, fast refresh, incremental loads.

3. Comparison Table
Feature .tde (Tableau Data Extract) .hyper (Hyper Extract)
Introduced Older extract format Tableau 10.5 onwards
Data Size Medium datasets Large, complex datasets
Performance Good, but limited for huge Very high, optimized for big data
data
Technology Single-threaded, older engine Multi-threaded, new Hyper engine
Use Case Smaller extracts, legacy Modern extracts, enterprise-level
workbooks analysis

Advanced Data Preparation in Tableau (Master Notes)


1. Joins in Tableau
A Join combines data from two or more tables based on a common field (key).
a) Inner Join
• Returns only matching records between both tables.
• Example: Orders + Customers → Only customers who placed orders.
Orders: A1, Cust1 | Customers: Cust1, Name1 | Result: A1, Cust1, Name1

b) Left Join
• Returns all records from left table, plus matching from right.
• Non-matches → NULL.
• Example: Orders + Customers → All orders included, even if customer info missing.

c) Right Join
• Opposite of Left Join → all records from right table + matching from left.
• Non-matches → NULL.
• Example: Customers + Orders → All customers shown, even those who never ordered.

d) Full Outer Join


• Returns all records from both tables (matched + unmatched).
• Example: Orders + Customers → Includes all orders & all customers, even if not linked.

2. Complex Joins
• Joins involving multiple conditions or tables.
• Example: Orders + Customers + Products → to know which customer bought which
product.
• Tableau supports up to 32 tables in a join.

3. Referential Integrity
• Definition: Ensures foreign keys in one table always exist in another (primary key).
• Use in Tableau:
o Option: Assume Referential Integrity.
o Queries only necessary tables → improves performance.
• Example: Employee Table (DeptID) + Department Table (DeptID).

4. Union
• Definition: Appends rows from multiple tables with identical structure.
• Example: Sales_Q1, Sales_Q2, Sales_Q3, Sales_Q4 → Combined into one Sales table.
• Before Union: Separate sheets (Q1, Q2…).
• After Union: Single table with all records stacked.

5. Data Blending
• Definition: Combines data from different sources at visualization level (not in DB).
• Works via Primary Source (blue) + Secondary Source (orange).
• Needs a common linking field (e.g., Date, Region, ID).
• When Used:
o Different databases (Excel + SQL).
o When join is not possible.
• Example: Blend Excel Sales with SQL Targets on Region field.

6. Cross-Database Join
• Definition: Directly joins tables from different databases inside Tableau.
• Example: MySQL → Orders table joined with Oracle → Inventory table on ProductID.
• Difference from Blending: Happens at database/query level (not viz level).

Comparison (Join vs Union vs Blending)


Feature Join Union Data Blending
Columns (side by Results at visualization
Combines Rows (stacked)
side) level
Common field
Requirement Common Key Same headers
(dimension)
Orders + Quarterly Sales Excel Sales + SQL
Use Case
Customers Tables Targets

Diagram Guide (For Exam Sketches)


1. Inner Join → Overlap of two circles.
2. Left Join → Entire left circle + overlap.
3. Right Join → Entire right circle + overlap.
4. Outer Join → Both circles combined.
5. Union → Tables stacked vertically.
6. Blending → Two databases (cylinders) linked at visualization.
7. Cross-DB Join → Different DB tables joined inside Tableau.

Exam-Ready Short Points


• Inner Join → Common records only.
• Left Join → All left + matching right.
• Right Join → All right + matching left.
• Outer Join → All records both sides.
• Complex Join → Multiple conditions or tables.
• Referential Integrity → Speeds queries by assuming FK–PK link.
• Union → Stacks rows.
• Data Blending → Combines sources at viz level.
• Cross-Database Join → Joins across different DBs.

1. Calculated Field
• Definition: A Calculated Field is a custom field created in Tableau by applying formulas or
expressions on existing fields.
• Purpose: Helps create new measures or dimensions not directly present in the data source.
• How to Create:
o Menu → Analysis → Create Calculated Field.
• Examples:
o Profit Ratio = [Profit] / [Sales]
o Full Name = [First Name] + " " + [Last Name]
Use Case: To calculate Profit Margin % when only Sales and Profit are available.

2. Measure Values
• Definition: Measure Values is a special field in Tableau that represents all the numerical
fields (measures) in the dataset.
• Purpose: Allows displaying multiple measures in a single view.
• How it works:
o When you drag “Measure Values” to Rows/Columns, Tableau plots all selected
measures together.
• Examples:
o Displaying Sales, Profit, Discount in one table using Measure Values.
Use Case: To show different KPIs (Sales, Profit, Quantity) side by side in one visualization.

3. Measure Names
• Definition: Measure Names is another special field that contains the names of all measures
in the dataset.
• Purpose: Works together with Measure Values to filter/select which measures to display.
• How it works:
o Drag “Measure Names” to Rows/Columns or Filters → Choose which measures to
show.
• Examples:
o Select only “Sales” and “Profit” to appear, while excluding “Quantity” and
“Discount”.
Use Case: To dynamically switch between measures in dashboards (e.g., toggle between Sales
and Profit).

Summary Table
Concept Definition Example Use Case
Custom field Create new KPIs
Calculated [Profit]/[Sales] = Profit
created using like Profit Margin
Field Ratio
formulas %
Display multiple
Measure Holds all numeric Sales, Profit, Discount
measures in one
Values measures in dataset shown together
chart
Measure Holds the names of “Sales”, “Profit”, Filter which
Names all measures “Quantity” measures to display

You might also like