What is a Fact Table?

In data warehousing and analytics databases, a fact table is where you store the actual measurements and metrics of your business. It’s the table that contains the numbers you want to analyze. This could include things like sales amounts, quantities, counts, durations, or any other quantifiable data points.

Fact tables are the foundation of dimensional modeling, the approach most commonly used for organizing data in data warehouses and business intelligence systems.

What Goes in a Fact Table?

A fact table contains two types of columns:

  • Facts are the numeric measurements themselves. These are the values you want to sum, average, count, or otherwise aggregate. In a sales fact table, this might include sales_amount, quantity_sold, discount_applied, and cost. In a website analytics fact table, it could be page_views, session_duration, and bounce_rate.
  • Foreign keys link to dimension tables that provide context about the measurements. These keys tell you who, what, when, where, and why about each fact. A sales fact table might have foreign keys to customer, product, date, and store dimensions.

Each row in a fact table typically represents a single business event or transaction. For example one sale, one page view, one manufacturing run, one customer service call, etc.

Types of Facts

Not all measurements work the same way, and understanding the difference matters for how you query and aggregate your data:

  • Additive facts can be summed across any dimension and produce meaningful results. Sales amounts, quantities, and costs are additive. You can add up sales across all products, all time periods, all customers and it all makes sense.
  • Semi-additive facts can be summed across some dimensions but not others. Account balances and inventory levels are classic examples. You can sum them across different accounts or products, but adding yesterday’s balance to today’s balance doesn’t make sense. For time-based analysis, you typically take snapshots or averages rather than sums.
  • Non-additive facts can’t be meaningfully summed at all. Ratios, percentages, temperatures, and unit prices fall into this category. To get an average unit price across multiple transactions, you need to calculate it from additive facts (total revenue divided by total quantity) rather than averaging the stored unit prices.

Granularity Matters

The granularity of a fact table defines what each row represents. This is one of the most important design decisions you’ll make.

A sales fact table might have one row per line item on an invoice (fine grain), one row per entire transaction (medium grain), or one row per customer per day (coarser grain). This choice affects everything from storage requirements, query patterns, and what questions you can answer.

Finer granularity gives you more flexibility. You can always aggregate detailed data to answer broader questions, but you can’t go the other direction. If you store daily totals, you can’t later decide you need hourly breakdowns.

The tradeoff is storage and processing. A fact table with millions of line items takes more space and requires more computation than one with thousands of daily summaries. Most data warehouses lean toward finer granularity because storage is cheap and flexibility is valuable.

Fact Table Design Patterns

There are several common types of fact tables you’ll encounter:

  • Transaction fact tables record individual business events as they happen. Each row is one sale, one click, one call. These are the most common and have the finest granularity.
  • Periodic snapshot fact tables capture the state of things at regular intervals. End-of-day account balances, weekly inventory levels, or monthly subscription counts. These work well for semi-additive facts and metrics that need point-in-time analysis.
  • Accumulating snapshot fact tables track processes that have a defined beginning and end with multiple milestones in between. An order fulfillment fact table might have columns for order_date, payment_date, ship_date, and delivery_date, with each row updated as the order progresses through the pipeline.
  • Factless fact tables track events where there’s no measurement to record – just that something happened. This could include stuff like student course attendance, product promotions being active, or insurance policy coverage periods. The fact table contains only foreign keys to dimensions, but it can still be useful for counting occurrences and analyzing patterns.

Visualizing Fact Tables

In a typical dimensional model, the fact table is at the center, connected to multiple dimension tables through foreign keys. This creates what’s called a star schema, with the fact table at the center with dimensions radiating outward, like this:

Conceptual diagram of a star schema

Most of the data volume in a data warehouse lives in fact tables. While dimension tables might have thousands or millions of rows, fact tables often have billions. A retail company might have 100,000 products in their product dimension but billions of sales transactions in their fact table.

This size difference is why fact tables need to be designed carefully. Every extra column, every unnecessary index, gets multiplied by billions of rows. Keep fact tables lean so that they contain just the foreign keys and the actual measurements.

Here’s a simple example of a fact table within a star schema:

Example of a fact table within a star schema

Foreign Keys and Surrogate Keys

Fact tables typically use surrogate keys rather than natural business keys for their foreign key references. Instead of storing a 12-character product SKU, you might store an integer product_key that links to the product dimension.

This approach has several benefits. Integer keys take less space (important when you have billions of rows). They’re faster to join. They can also handle slowly changing dimensions better. For example, if a product’s attributes change over time, the surrogate key can point to the appropriate version of the dimensional data.

The fact table doesn’t care what the product SKU is or what the customer’s name is. It just needs to know which dimension records provide that context.

Query Patterns

Most analytical queries follow a similar pattern: join the fact table to relevant dimension tables, filter on dimensional attributes, and aggregate the facts.

“Show me total sales by product category for the Northwest region in Q4” might join the fact table to product, geography, and date dimensions, filter on the specified attributes, and sum the sales_amount.

The fact table is almost always the starting point. You rarely query dimension tables without involving facts, but you’re constantly querying facts with dimensional context.

Design Considerations

When designing a fact table, start by identifying the business process you’re modeling and the measurements that matter. What are people trying to analyze? What metrics drive decisions?

Then determine the granularity. What’s the most detailed level at which you want to capture data? This usually matches the level at which the business process naturally occurs.

Finally, identify the dimensions that provide context. What perspectives do users need to slice and dice the data? Each of these becomes a foreign key in your fact table.

Keep the fact table focused on measurements. Descriptive attributes belong in dimension tables, not here. If you find yourself adding text columns or attributes to a fact table, that’s usually a sign you need a dimension table instead.