# Druid vs ClickHouse (2025) (/comparison/druid-vs-clickhouse)


## Architecture [#architecture]

The biggest difference among cloud data warehouses are whether they separate storage and compute, how much they isolate data and compute, and what clouds they can run on.

| Feature                                           | Druid                                                                                                                                                                       | ClickHouse                                                                                                                                                                                                                                      |
| ------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Separation of storage and compute                 | No                                                                                                                                                                          | Yes – SharedMergeTree engine in ClickHouse Cloud enables full separation of storage and compute, with compute-compute separation through Warehouses feature (introduced 2025) allowing multiple isolated compute services sharing the same data |
| Supported cloud infrastructure                    | Can be installed anywhere                                                                                                                                                   | AWS, GCP, Azure, cloud service and on-premises                                                                                                                                                                                                  |
| Isolated tenancy – option for dedicated resources | Single tenant                                                                                                                                                               | • Multi-tenant metadata layer<br />• Isolated tenancy for compute & storage per client in cloud                                                                                                                                                 |
| Control vs abstraction of compute                 | • Complex configuration of compute tier with multiple role-specific nodes<br />• Configurable node count<br />• Configurable compute types (virtual machines or kubernetes) | Configurable cluster size and compute types in ClickHouse Cloud with granular control over nodes (1-128 nodes) and node characteristics. Warehouses feature enables multiple isolated read-only compute environments.                           |
| Self-hosted and hybrid deployment options         | Self-managed deployment required                                                                                                                                            | Self-managed deployments available with full control over infrastructure                                                                                                                                                                        |
| ACID Compliance and Transactions                  | Limited ACID support with eventual consistency                                                                                                                              | Limited ACID compliance with MergeTree engine family.                                                                                                                                                                                           |

**Druid** is an OLAP engine designed to provide fast real time analytics. Druid adopts a clustered architecture with servers that host various role specific processes. These processes address real time and batch ingestion, indexing, querying of historical and real time data. Apache Druid can be deployed as a virtual machine or a Kubernetes based cluster. Druid does not support a decoupled compute & storage architecture. Deep storage in the form of object storage is used to replicate data to.

**ClickHouse** was originally developed at Yandex, the Russian search engine, as an OLAP engine for low latency analytics. It was built as an on-premise solution with coupled compute & storage, and a large variety of tuning options in the form of indexes and merge trees. ClickHouse's architecture is famous for its focus on performance and low-latency queries. The tradeoff is that it is considered very difficult to work with. SQL support is very limited, and tuning/running it requires significant engineering resources.

## Scalability [#scalability]

There are three big differences among data warehouses and query engines that limit scalability: decoupled storage and compute, dedicated resources, and continuous ingestion.

| Feature                                                         | Druid                                                                                                     | ClickHouse                                                                                                                                                                                                                                     |
| --------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Elasticity – Scaling for larger data volumes and faster queries | Scale-up of nodes requires careful planning and downtime. Addition of new nodes for scale-out is possible | Automatic horizontal and vertical scaling in ClickHouse Cloud with SharedMergeTree architecture. Manual scaling for self-managed deployments with cluster rebalancing capabilities                                                             |
| Elasticity – Scaling for higher concurrency                     | Supports 100s to 100,000s queries per second (1000+ QPS) with proper configuration and scaling            | Supports high concurrency with proper resource allocation and configuration. Vertical auto-scaling and horizontal manual scaling. Additional warehouses can idle to zero billing. Primary service always on in multi-warehouse configurations. |

**Druid** provides the ability to handle fast ingest and high concurrency. Custom sizing and cluster tuning are required to balance the compute, memory, storage needs of each process within Druid and to provide high concurrency. Druid clusters can be grown by adding nodes with automatic rebalancing of storage segments assigned to nodes. Self hosted Druid on Kubernetes is an option that users leverage to simplify scaling. Additionally, Cloud based managed Druid offerings are being rolled out. However, these managed offerings are limited in scale and scaling is not granular.

**ClickHouse** doesn't offer any dedicated scaling features or mechanisms. While it can deliver linearly scalable performance for some types of queries, scaling itself has to be done manually. Hardware is self-managed in ClickHouse. This means that to scale you would have to provision a cluster and migrate.

## Performance [#performance]

Performance is the biggest challenge with most data warehouses today.

While decoupled storage and compute architectures improved scalability and simplified administration, for most data warehouses it introduced two bottlenecks; storage, and compute. Most modern cloud data warehouses fetch entire partitions over the network instead of just fetching the specific data needed for each query. While many invest in caching, most do not invest heavily in query optimization. Most vendors also have not improved continuous ingestion or semi-structured data analytics performance, both of which are needed for operational and customer-facing use cases.

| Feature                                                      | Druid                                                                                                                        | ClickHouse                                                                                                                                                                                                                                                                                                                                        |
| ------------------------------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Indexes                                                      | Compressed bitmap indexes for data access and roll-ups to manage aggregations                                                | • Primary indexes • Skipping indexes (minmax, set, bloom filters, ngrambf\_v1, tokenbf\_v1)<br />• MergeTree indexes<br />• Incremental Materialized views                                                                                                                                                                                        |
| Compute tuning                                               | On-premises, self-managed hardware. Druid requires infrastructure management and leverages commonly available instance types | Configurable compute resources in cloud offering                                                                                                                                                                                                                                                                                                  |
| Storage format                                               | Columnar storage format with time-based sorting                                                                              | Columnar, supports sorted, compressed, encoded & sparsely indexed files with native Apache Iceberg support.                                                                                                                                                                                                                                       |
| Table-level partition & pruning techniques                   | Restrictive time-based partitioning. Can partition based on other secondary columns                                          | Partitioning by date/time and custom partitions with MergeTree indexes.                                                                                                                                                                                                                                                                           |
| Result cache                                                 | Ability to support caching on broker (set to off by default)                                                                 | Yes, results cache with TTL and query condition cache.                                                                                                                                                                                                                                                                                            |
| Warm cache (SSD)                                             | Yes, at much larger segment level granularity                                                                                | Yes, at indexed data-range level granularity                                                                                                                                                                                                                                                                                                      |
| Support for semi-structured data & JSON functions within SQL | Recommend flattening JSON or translate to array prior to loading. No support for JSON parsing at query runtime               | Yes, including Lambda expressions and native JSON data type (GA in v25.3)                                                                                                                                                                                                                                                                         |
| Vector Search and AI Capabilities                            | No native AI or vector search capabilities                                                                                   | • Native vector search capabilities and embeddings<br />• MCP Server for AI driven analytics<br />• Natural Language to SQL<br />• SQL based Inference                                                                                                                                                                                            |
| Query Optimizations                                          | • Compressed bitmap indexes • Roll-up aggregations • Time-based optimization • Query optimization requires manual tuning     | • Primary indexes (ORDER BY)<br />• Data skipping indexes (minmax, set, bloom filters, ngrambf\_v1, tokenbf\_v1)<br />• Materialized views<br />• Projections<br />• PREWHERE optimization<br />• Query analysis tools<br />• Automatic global join reordering (v25.9)<br />• Enhanced JSON query optimization<br />• Streaming secondary indices |

**Druid** provides high performance through columnar storage format, parallel processing, bitmap indexes and roll-ups. Druid, however, recommends a denormalized data model for performance needs. Join operations in Druid are a relatively new feature with various limitations, especially if there is a need to join large datasets.

**ClickHouse** is famous for being one of the fastest local runtimes ever built for OLAP workloads. Its columnar storage, compression and indexing capabilities make it a consistent leader in benchmarks. Its lack of support for standard SQL and lack of query optimizer means that it's less suitable for traditional BI workloads, and more suitable for engineering managed workloads. While fast, it requires a lot of tuning and optimization.

## Use cases [#use-cases]

There are a host of different analytics use cases that can be supported by a data warehouse. Look at your legacy technologies and their workloads, as well as the new possible use cases, and figure out which ones you will need to support in the next few years.

| Feature                                                                      | Druid                                                                                                                                                                                           | ClickHouse                                                                                                                                                                                                                                                                                                                                                                                                                           |
| ---------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| Low-latency dashboards                                                       | • Sub-second load times optimized for time-series and real-time analytics • Built for high-concurrency interactive dashboards • Requires denormalized data model                                | • Sub-second load times at TB+ scale with proper indexing<br />• ClickHouse Cloud reduces engineering overhead with managed service • Proven low-latency performance (120ms at 2500 QPS in benchmarks)<br />• Purpose-built for low-latency OLAP and real-time analytics                                                                                                                                                             |
| Enterprise BI                                                                | • Limited integrations with traditional Enterprise BI tools • Strong for real-time operational dashboards • Requires specialized visualization tools                                            | • Growing ecosystem with 50+ integrations including major BI tools<br />• Native MySQL protocol support enables broad BI tool compatibility • Strong SQL compliance with PostgreSQL compatibility<br />• Best suited for modern analytical workloads and engineering-managed use cases                                                                                                                                               |
| Data Apps and AI Applications (Customer-facing low-latency high concurrency) | • Built for high concurrency (1000+ QPS) with distributed architecture • Sub-second response times for time-series data • Optimized for real-time operational applications • No AI capabilities | • Sub-second response times at TB+ scale<br />• Supports 1000 concurrent users per replica<br />• Strong price-performance on customer-facing applications<br />• Native vector search and embeddings                                                                                                                                                                                                                                |
| Ad hoc                                                                       | • Not optimized for ad-hoc queries • Requires predefined roll-ups and data modeling • Limited flexibility for exploratory analysis                                                              | • Good for ad-hoc queries with ClickHouse Cloud's separated storage/compute architecture<br />• Join optimizations enable more query complexity<br />• Strong sampling capabilities (TABLESAMPLE) for exploratory analysis<br />• Resource management through user quotas prevents query interference<br />• Materialized views offer performance improvements for common aggregation patterns, ad-hoc users specify directly in SQL |

**Druid** is designed as an OLAP engine to provide fast access to aggregations that are run against large volumes of data. Druid is typically used for customer facing analytics and streaming data processing. Druid is used as an add-on with other data warehousing products that are efficient at scaling, joining, and filtering large volumes of data. It is not a suitable option for data warehouse replacement.

**ClickHouse** was not designed to be a data warehouse, but rather a low-latency query execution runtime. Managing it typically requires significant engineering overhead. Hence, it's a good fit for engineering managed operational use cases and customer-facing data apps, where low latency matters. It is not a good fit for a general purpose data warehouse, nor for Ad-Hoc analytics or ELT.