Skip to main content

Denormalization

Denormalization intentionally duplicates or precomputes data to reduce query work.

It can speed reads, but increases write complexity and consistency risk.

Core Concepts

flowchart LR
A[Normalized model] --> B[Frequent costly read path]
B --> C[Add denormalized summary/column]
C --> D[Faster reads]
D --> E[Need sync strategy on writes]
BenefitTradeoff
Faster hot-path readsMore write logic
Fewer runtime joinsPossible stale/duplicate data
Simpler reporting queryMaintenance overhead
SELECT c.customer_id, c.name, SUM(o.total) AS total_spent
FROM customers_denorm AS c
INNER JOIN orders_denorm AS o
ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.name
ORDER BY c.customer_id;
customer_idnametotal_spent
1Ada120.0
2Alan75.0

Code Examples

-- Setup normalized source tables.
CREATE TABLE customers_denorm (
customer_id INTEGER PRIMARY KEY,
name TEXT NOT NULL
);

CREATE TABLE orders_denorm (
order_id INTEGER PRIMARY KEY,
customer_id INTEGER NOT NULL,
total REAL NOT NULL
);

INSERT INTO customers_denorm (customer_id, name)
VALUES (1, 'Ada'), (2, 'Alan');

INSERT INTO orders_denorm (order_id, customer_id, total)
VALUES (1, 1, 80.0), (2, 1, 40.0), (3, 2, 75.0);
Expected output
Normalized source tables populated.
-- Create denormalized summary table for fast reads.
CREATE TABLE customer_summary_denorm (
customer_id INTEGER PRIMARY KEY,
name TEXT NOT NULL,
total_spent REAL NOT NULL
);

INSERT INTO customer_summary_denorm (customer_id, name, total_spent)
SELECT c.customer_id, c.name, SUM(o.total) AS total_spent
FROM customers_denorm AS c
INNER JOIN orders_denorm AS o
ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.name;
Expected output
Summary table created and filled with pre-aggregated totals.
-- Fast reporting query on denormalized table.
SELECT customer_id, name, total_spent
FROM customer_summary_denorm
ORDER BY customer_id;
customer_idnametotal_spent
1Ada120.0
2Alan75.0

SQLite-Specific Nuances

SQLite Nuance

SQLite handles joins efficiently for many workloads; denormalization should be justified by measured bottlenecks, not assumed upfront.

Common Pitfalls / Best Practices

Pitfall

Adding denormalized columns/tables without a reliable refresh strategy, leading to stale reporting data.

Best Practice

Document ownership of denormalized data and implement explicit refresh logic (scheduled rebuild, triggers, or controlled update pipeline).

Quick Challenge

Create a denormalized table daily_sales_summary(day, total_sales) from orders_denorm (assume you add a date column), then query it directly.

View Solution
CREATE TABLE daily_sales_summary (
day TEXT PRIMARY KEY,
total_sales REAL NOT NULL
);

-- Example refresh pattern (assuming orders_denorm has order_day):
INSERT INTO daily_sales_summary (day, total_sales)
SELECT order_day AS day, SUM(total) AS total_sales
FROM orders_denorm
GROUP BY order_day;
Expected output
Pre-aggregated daily totals become query-ready for fast reporting.