Denormalization
Denormalization intentionally duplicates or precomputes data to reduce query work.
It can speed reads, but increases write complexity and consistency risk.
Core Concepts
flowchart LR
A[Normalized model] --> B[Frequent costly read path]
B --> C[Add denormalized summary/column]
C --> D[Faster reads]
D --> E[Need sync strategy on writes]
| Benefit | Tradeoff |
|---|---|
| Faster hot-path reads | More write logic |
| Fewer runtime joins | Possible stale/duplicate data |
| Simpler reporting query | Maintenance overhead |
- Normalized Query
- Denormalized Query
SELECT c.customer_id, c.name, SUM(o.total) AS total_spent
FROM customers_denorm AS c
INNER JOIN orders_denorm AS o
ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.name
ORDER BY c.customer_id;
| customer_id | name | total_spent |
|---|---|---|
| 1 | Ada | 120.0 |
| 2 | Alan | 75.0 |
SELECT customer_id, name, total_spent
FROM customer_summary_denorm
ORDER BY customer_id;
| customer_id | name | total_spent |
|---|---|---|
| 1 | Ada | 120.0 |
| 2 | Alan | 75.0 |
Code Examples
-- Setup normalized source tables.
CREATE TABLE customers_denorm (
customer_id INTEGER PRIMARY KEY,
name TEXT NOT NULL
);
CREATE TABLE orders_denorm (
order_id INTEGER PRIMARY KEY,
customer_id INTEGER NOT NULL,
total REAL NOT NULL
);
INSERT INTO customers_denorm (customer_id, name)
VALUES (1, 'Ada'), (2, 'Alan');
INSERT INTO orders_denorm (order_id, customer_id, total)
VALUES (1, 1, 80.0), (2, 1, 40.0), (3, 2, 75.0);
| Expected output |
|---|
| Normalized source tables populated. |
-- Create denormalized summary table for fast reads.
CREATE TABLE customer_summary_denorm (
customer_id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
total_spent REAL NOT NULL
);
INSERT INTO customer_summary_denorm (customer_id, name, total_spent)
SELECT c.customer_id, c.name, SUM(o.total) AS total_spent
FROM customers_denorm AS c
INNER JOIN orders_denorm AS o
ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.name;
| Expected output |
|---|
| Summary table created and filled with pre-aggregated totals. |
-- Fast reporting query on denormalized table.
SELECT customer_id, name, total_spent
FROM customer_summary_denorm
ORDER BY customer_id;
| customer_id | name | total_spent |
|---|---|---|
| 1 | Ada | 120.0 |
| 2 | Alan | 75.0 |
SQLite-Specific Nuances
SQLite Nuance
SQLite handles joins efficiently for many workloads; denormalization should be justified by measured bottlenecks, not assumed upfront.
Common Pitfalls / Best Practices
Pitfall
Adding denormalized columns/tables without a reliable refresh strategy, leading to stale reporting data.
Best Practice
Document ownership of denormalized data and implement explicit refresh logic (scheduled rebuild, triggers, or controlled update pipeline).
Quick Challenge
Create a denormalized table daily_sales_summary(day, total_sales) from orders_denorm (assume you add a date column), then query it directly.
View Solution
CREATE TABLE daily_sales_summary (
day TEXT PRIMARY KEY,
total_sales REAL NOT NULL
);
-- Example refresh pattern (assuming orders_denorm has order_day):
INSERT INTO daily_sales_summary (day, total_sales)
SELECT order_day AS day, SUM(total) AS total_sales
FROM orders_denorm
GROUP BY order_day;
| Expected output |
|---|
| Pre-aggregated daily totals become query-ready for fast reporting. |