When ClickHouse CPU Explodes: A Practical Lesson in OLAP vs OLTP Design
Abstract
ClickHouse is an extremely fast analytical database—but it is not a key-value store.
This article walks through a real-world performance incident where a seemingly harmless query (LIMIT 1) caused massive CPU usage, explains why it happened, and presents a systemic fix using proper data-responsibility separation between OLAP and OLTP systems.
If you are using ClickHouse for event tracking, affiliate analytics, or high-volume logging, this will save you from a painful production outage.
The Symptom
The system showed sustained high CPU usage on the ClickHouse server:
- CPU usage: ~60%
- Disk IO: low
- Memory usage: normal
- No long-running queries visible
At first glance, nothing looked obviously wrong.
However, inspecting system.query_log revealed a pattern:
- Queries finished very fast (~50–80 ms)
- Each query scanned millions of rows
- The same query was executed very frequently
This combination is lethal:
Fast queries × huge scans × high QPS = CPU exhaustion
The Query That Looked Innocent
1 | SELECT |
At face value:
LIMIT 1- Single equality filter
- Should be fast, right?
Wrong.
Why This Breaks ClickHouse
ClickHouse is an OLAP database.
Its performance model depends heavily on the table’s ORDER BY (sorting key).
Example table definition:
1 | ENGINE = MergeTree |
Key observation:
system_click_idis not part of the ORDER BY key
What ClickHouse Must Do
- It cannot locate rows by
system_click_id - It must scan the entire partition
LIMIT 1is applied after scanning and filtering
In practice:
- ~8 million rows scanned
- ~200–300 MB read
- Per query
Repeat this dozens or hundreds of times per second → CPU meltdown.
The Root Cause (Not a Bug)
This was not a ClickHouse bug.
It was a data-responsibility violation.
The system was using ClickHouse for:
- ✅ Analytics (good)
- ❌ Single-row lookup by unique ID (bad)
OLAP vs OLTP: The Missing Layer
The system lacked a proper lookup layer.
Two fundamentally different access patterns were mixed:
| Access Pattern | Characteristics | Correct Storage |
|---|---|---|
| Single-row lookup | High QPS, low latency, exact key | OLTP / KV |
| Aggregation & reporting | Large scans, batch queries | OLAP |
ClickHouse excels at the second—and performs terribly at the first.
The Correct Architecture
High-Level Separation
1 | ┌──────────────────┐ |
The Minimal Fix (No Big Rewrite)
1. Keep Raw Events in ClickHouse
ClickHouse remains the source of truth for analytics:
- Clicks
- Conversions
- Rollups
- Daily statistics
Nothing is removed.
2. Add a Lightweight Lookup Table (OLTP)
A new table is introduced in an OLTP database (e.g., PostgreSQL):
1 | CREATE TABLE clicks_lookup ( |
Characteristics:
- Very small rows
- Written once per click
- Indexed by primary key
- Designed for O(1) lookups
3. Adjust the Data Flow
On Click
1 | User click → |
On Conversion
1 | Conversion callback → |
Key rule:
Conversion processing must never query ClickHouse.
Why This Works
- OLTP handles high-QPS point lookups efficiently
- ClickHouse focuses purely on analytics
- CPU usage drops immediately
- System becomes horizontally scalable
This is not an optimization—it is correct system design.
Common Anti-Patterns to Avoid
❌ Using ClickHouse for:
WHERE unique_id = ?LIMIT 1lookups- High-frequency API queries
- OLTP-style joins
✅ Use ClickHouse for:
- Time-range queries
- Aggregations
- Rollups
- Reporting and analytics
Takeaway
ClickHouse is not slow.
It is precise about what it is good at.
Most ClickHouse “performance problems” are actually architecture problems.
If your system starts showing unexplained CPU spikes:
- Check what you are querying
- Not just how fast it runs
Designing for the right access pattern will always outperform any tuning knob.
Final Thought
This incident marked a transition point:
from “it works”
to “it scales”
And that transition almost always requires separating OLTP and OLAP responsibilities.
P.S.
This article is very subjective. If you do not feel comfortable viewing it, please close it as soon as possible.
If you think my article can help you, you can subscribe to this site by using RSS.
When ClickHouse CPU Explodes: A Practical Lesson in OLAP vs OLTP Design
https://iiiyu.com/2026/01/09/When-ClickHouse-CPU-Explodes-A-Practical-Lesson-in-OLAP-vs-OLTP-Design/
