Firebolt vs ClickHouse Cloud Differentiations

Clickhouse = Query Accelerator that is NOT suitable for many of the data warehouse workloads

‍Firebolt (is much more)= Query Accelerator + Data Warehouse

Engineered for data-intensive apps
ACID compliant,
Strong and global consistency,
Multi-writer engines,
Everything decoupled architecture (storage, metadata, compute)
Multidimensional elasticity,
Datawarehouse grade SQL

SQL Capabilities & Query Expressiveness

❌ Lacks full SQL expressiveness expected in data warehouse workloads.

❌ Does not support automatic subquery decorrelation — making complex, nested queries fail or perform poorly.
❌ Limited optimizer support for join reordering, predicate pushdown, over large sets.
❌ “Fails to execute” 11 of 22 TPC-H queries as of v24.6.

✅ Rich SQL Expressiveness for:

✅ Support for core data warehouse SQL concepts like correlated subqueries, and lateral joins enabling developers to write rich, declarative queries that perform efficiently.
✅ Mature planner that combines rules, cost and history based optimizations into one. With automatic decorrelation, of subqueries, join reordering and complex join optimizations.
✅ Supports 100% of TPC-H queries — validated and benchmarked.

Scalable Query Processing

❌ Distributed execution is limited —

Limited to two stages - first distributed, second brings everything to single node
This is NOT scalable and causes OOMs for big joins, high cardinality aggregations, window functions etc
Adding more nodes won’t help in these cases

❌ Complex ELT workloads (large fact-to-fact joins, GROUP BY aggregations) can lead to OOM.

✅ Designed for true distributed execution:

✅ Plans can have arbitrary number of stages
✅ Connected by efficient shuffle operator
✅ Big joins, high cardinality aggregates etc are scalable because they utilize power of ALL compute nodes in the cluster
✅ Execution scales with adding more nodes

✅ Support complex ELT workloads:

Scale to handle transformations that exceed the bounds of the physical infrastructure
ELT to Firebolt supports large fact-to-fact joins and complex GROUP BY aggregations. By adding more compute nodes, ELT workloads can scale efficiently

Everything Decoupled Architecture

❌ While compute and storage can be decoupled, metadata management remains tightly coupled with the compute layer. As a result, the primary compute service must remain active at all times to maintain metadata consistency and availability, leading to higher operational costs.

Source: Clickhouse docs

‍

❌ In case of multi read-write services inserting in one read-write cluster (service) can prevent another read-write service from idling.

Source: Clickhouse docs

‍

❌ No support for scaling compute elastically based on bursty workload and variable concurrency— static cluster provisioning leads to resource contention or overprovisioning.

✅ Built with native separation of storage, compute and metadata — every compute engine (clusters) comes with full read and write capability. Any change to data or metadata is immediately visible across any engine without any synchronization (global consistency).

‍

✅ Supports true elastic scaling:

Auto-start and auto-stop unused engines to reduce cost
Scale up or down compute per engine based on workload size
Add or remove engine clusters dynamically to handle concurrency bursts

‍

✅ Built-in workload isolation, executes high concurrency at scale with multiple compute engines running in parallel. We have engines for ingestion, transformation, serving dashboards, or APIs that can run independently without interfering with each other.