What is a Dimension Table?

In data warehousing and analytics systems, dimension tables provide context for the measurements stored in fact tables. While fact tables contain the numbers you want to analyze, dimension tables tell you what those numbers mean. This could be things like who was involved, what product it was, when it happened, where it occurred, etc.

Basically, dimension tables are the “descriptive” side of dimensional modeling, containing the attributes and characteristics that let you filter, group, and label your data in meaningful ways.

What Goes in a Dimension Table?

Dimension tables are typically wide, with many columns containing descriptive attributes. A customer dimension might include name, email, phone, address, city, state, zip code, customer segment, registration date, and loyalty status. A product dimension could have SKU, product name, description, brand, category, subcategory, color, size, weight, and supplier.

These tables usually have a primary key that fact tables reference through foreign keys. This is often a surrogate key (a simple integer) rather than a natural business key.

Unlike fact tables where most columns are numeric, dimension tables are mostly text and categorical data. These are the attributes you use in WHERE clauses to filter data and in GROUP BY clauses to organize results.

The Role of Dimension Tables

Dimension tables answer the context questions about your facts:

  • Who was involved? Customer, employee, vendor, or user dimensions.
  • What happened? Product, service, or event type dimensions.
  • When did it occur? Date and time dimensions.
  • Where did it take place? Location, geography, or facility dimensions.
  • Why or how? Reason codes, transaction types, or method dimensions.

Every fact in your fact table is surrounded by these dimensional perspectives. A single sales transaction might connect to customer, product, date, store, and promotion dimensions, each adding layers of meaning to the raw sales amount.

Dimension Table Characteristics

Dimension tables have several defining characteristics that set them apart from fact tables:

  • They’re relatively small: While fact tables might contain billions of rows, dimension tables might only have thousands to millions. For example a retailer might have billions of transactions but only 100,000 products and a few million customers.
  • They’re wide rather than tall: It’s common for dimension tables to have 20, 50, or even 100+ columns. You’re trying to capture every relevant attribute that someone might want to filter or group by.
  • They change slowly: Product attributes, customer information, and organizational structures don’t change every second like transactions do. This is why they’re often called “slowly changing dimensions“.
  • They’re denormalized: Unlike operational databases where you might split customer information across multiple normalized tables, dimension tables typically keep all related attributes together in one place for query simplicity.

Slowly Changing Dimensions

One of the trickier aspects of dimension tables is handling changes over time. When a customer moves to a new address or a product changes categories, you need a strategy for managing that change while preserving historical accuracy.

There are several approaches, known as slowly changing dimension (SCD) types. Here are the most common ones:

  • Type 1 simply overwrites the old value with the new one. This is simple but loses history. If a customer moves, you update their address and that’s it. Historical analysis will show all their past orders as if they always lived at the new address.
  • Type 2 creates a new row for each change, preserving full history. Each row gets effective start and end dates (or a current flag), and the surrogate key changes. This lets you analyze data as it actually was at any point in time. Most data warehouses use Type 2 for important dimensional changes.
  • Type 3 adds columns to track both old and new values. You might have current_address and previous_address columns. This gives you limited history but keeps the structure simpler than Type 2.

Other types like Type 0 (fixed), Type 4 (history table), and Type 6 (hybrid of 1, 2, 3) offer variations for different historical tracking needs, balancing accuracy with complexity. The following table outlines these, along with typical use cases:

TypeActionHistory Preserved?Best Use Case
Type 0Retain OriginalNoFixed values like “Date of Birth” or “Original Start Date.”
Type 1OverwriteNoCorrecting typos or tracking things where history doesn’t matter (e.g., a phone number).
Type 2Add New RowYes (Full)Tracking a customer’s address or a product’s price history. This is widely considered the gold standard.
Type 3Add New ColumnYes (Partial)When you only care about the “current” vs. the “previous” value (e.g., sales territory).
Type 4History TableYes (Separate)High-volume changes where you want to keep the main table small and fast.
Type 5Hybrid (4+1)Yes (Separate)Large dimensions where you need the current profile value linked directly in the base record without joining the history table. Rarely used in practice.
Type 6Hybrid (1+2+3)Yes (Full)Complex reporting where you need to group by both historical and current values.

The choice depends on your business needs. For example, do users need to see historical data as it was at the time, or is current information sufficient?

Common Dimension Types

Certain dimensions appear in almost every data warehouse:

  • Date dimensions are nearly universal. Rather than just storing dates as timestamps, date dimensions include columns for day of week, month, quarter, year, fiscal period, whether it’s a holiday or weekend, and more. This makes time-based analysis much easier. For example, you can filter to “Q4 weekdays” without complex date calculations.
  • Time dimensions handle time of day separately from dates, with attributes like hour, minute, AM/PM, business hours flag, and time period groupings.
  • Geographic dimensions contain location hierarchies like city, state/province, country, region, etc. They might include postal codes, sales territories, time zones, and demographic information.
  • Customer and product dimensions are specific to your business but nearly always present. These tend to be the largest and most complex dimensions, with many attributes describing different aspects of customers or products.

Junk Dimensions

Sometimes you have a bunch of low-cardinality flags and indicators that don’t warrant their own dimension tables. This might include things like payment method, order priority, shipping type, or transaction status.

Rather than adding these directly to the fact table or creating tiny dimension tables for each one, you can create a “junk dimension” that combines them. This dimension contains every possible combination of these flags. It reduces clutter in the fact table while keeping the flags available for analysis.

Degenerate Dimensions

Occasionally, you might have dimensional data that makes sense to store directly in the fact table rather than creating a separate dimension table. Order numbers and invoice numbers are classic examples. They’re identifiers that provide context but don’t have any attributes of their own to query.

These are called degenerate dimensions. They look like dimension keys in the fact table, but there’s no corresponding dimension table. They can be useful for drilling down to transaction details or grouping facts by the identifier.

Role-Playing Dimensions

Sometimes the same dimension table plays multiple roles in relation to a fact table. A date dimension might be used for order_date, ship_date, and delivery_date, with all pointing to the same underlying date dimension table but serving different purposes.

Rather than creating three separate copies of the date dimension, you use the same table with different foreign keys in the fact table. In queries, you alias it differently for each role to keep things clear.

Dimension Table Design

When designing dimension tables, be sure to include every attribute that users might want to filter or group by. It’s better to have too many columns than to force users to go back to source systems for attributes you left out.

Use meaningful, business-friendly names. Dimension tables are often queried directly by analysts and business users, so customer_segment is better than cust_seg_cd.

Consider adding computed or derived attributes that make analysis easier. In a date dimension for example, you might decide to include is_weekend or days_until_end_of_quarter even though these could be calculated from the base date. In a product dimension, you might include is_high_value flags or category groupings that analysts use frequently.

Keep dimension tables as flat as possible. Resist the urge to normalize them into multiple tables. The query simplicity of having all attributes in one place is usually worth the storage cost of some redundancy.

Dimensions and Query Performance

Because dimension tables are relatively small and queried frequently, they’re often fully cached in memory by modern database systems. This makes joins between fact tables and dimensions fast, even when you’re joining to multiple dimensions.

The descriptive nature of dimension tables also makes them perfect for indexing. Users frequently filter on dimensional attributes (e.g., “show me sales for premium customers in the electronics category”), and indexes on these columns make those queries fast.

Dimension tables are where users spend time exploring and understanding their data. A well-designed dimension table with clear, comprehensive attributes makes self-service analytics possible. A poorly designed one forces users to constantly ask for help or gives them incomplete views of their business.