# Redshift vs ClickHouse (/comparison/redshift-vs-clickhouse)



<Video id="DHLOVj2qrMI" />

## Architecture [#architecture]

The biggest difference among cloud data warehouses are whether they separate storage and compute, how much they isolate data and compute, and what clouds they can run on.

| Feature                                           | Redshift                                                                                                           | ClickHouse                                                                                                                                                                                                                                      |
| ------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Separation of storage and compute                 | RA3 instances enable separation of compute and storage, but limited workload isolation compared to other platforms | Yes – SharedMergeTree engine in ClickHouse Cloud enables full separation of storage and compute, with compute-compute separation through Warehouses feature (introduced 2025) allowing multiple isolated compute services sharing the same data |
| Supported cloud infrastructure                    | AWS only                                                                                                           | AWS, GCP, Azure, cloud service and on-premises                                                                                                                                                                                                  |
| Isolated tenancy – option for dedicated resources | • Isolated tenant & resources • Runs in your VPC                                                                   | • Multi-tenant metadata layer<br />• Isolated tenancy for compute & storage per client in cloud                                                                                                                                                 |
| Control vs abstraction of compute                 | • Configurable cluster size<br />• Configurable compute types                                                      | Configurable cluster size and compute types in ClickHouse Cloud with granular control over nodes (1-128 nodes) and node characteristics. Warehouses feature enables multiple isolated read-only compute environments.                           |
| Self-hosted and hybrid deployment options         | Limited hybrid options with Redshift Serverless                                                                    | Self-managed deployments available with full control over infrastructure                                                                                                                                                                        |
| ACID Compliance and Transactions                  | ACID compliant at table level with some limitations on concurrent operations                                       | Limited ACID compliance with MergeTree engine family.                                                                                                                                                                                           |

**Redshift** has the oldest architecture, being the first Cloud DW in the group. Its architecture wasn't designed to separate storage & compute. While it now has RA3 nodes which allow you to scale compute and only cache the data you need locally, all compute still operates together. You cannot separate and isolate different workloads over the same data, which puts it behind other decoupled storage/compute architectures. Redshift runs as an isolated tenant per customer, and unlike other cloud data warehouses, it is deployed in your VPC. Redshift offers a serverless option which is based on an abstracted unit called Redshift Processing Unit (RPU) ranging from 8 to 512 in increments of 8. Each RPU provides 2 vCPU and 16GB RAM. Thus, 8 RPU is equivalent to 16 vCPU / 128GB RAM. The minimum RPU is 8.

**ClickHouse** was originally developed at Yandex, the Russian search engine, as an OLAP engine for low latency analytics. It was built as an on-premise solution with coupled storage & compute, and a large variety of tuning options in the form of indexes and and merge trees. ClickHouse's architecture is famous for its focus on performance and low-latency queries. The tradeoff is that it is considered very difficult to work with. SQL support is very limited, and tuning/running it requires significant engineering resources.

## Scalability [#scalability]

There are three big differences among data warehouses and query engines that limit scalability: decoupled storage and compute, dedicated resources, and continuous ingestion.

\| Feature | |
\| --- | |
\| Elasticity – Scaling for larger data volumes and faster queries | |
\| Elasticity – Scaling for higher concurrency | |

**Redshift** is limited in scale because even with RA3, it cannot distribute different workloads across clusters. While it can scale to up to 10 clusters automatically to support query concurrency, it can only handle a maximum of 50 queued queries across all clusters by default.

**ClickHouse** doesn't offer any dedicated scaling features or mechanisms. While it can deliver linearly scalable performance for some types of queries, scaling itself has to be done manually. Hardware is self-managed in ClickHouse. This means that to scale you would have to provision a cluster and migrate.

## Performance [#performance]

Performance is the biggest challenge with most data warehouses today.
While decoupled storage and compute architectures improved scalability and simplified administration, for most data warehouses it introduced two bottlenecks; storage, and compute. Most modern cloud data warehouses fetch entire partitions over the network instead of just fetching the specific data needed for each query. While many invest in caching, most do not invest heavily in query optimization. Most vendors also have not improved continuous ingestion or semi-structured data analytics performance, both of which are needed for operational and customer-facing use cases.

| Feature                                                      | Redshift                                                                                                                                                                                                  | ClickHouse                                                                                                                                                                                                                                                                                                                                        |
| ------------------------------------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Indexes                                                      | None                                                                                                                                                                                                      | • Primary indexes • Skipping indexes (minmax, set, bloom filters, ngrambf\_v1, tokenbf\_v1)<br />• MergeTree indexes<br />• Incremental Materialized views                                                                                                                                                                                        |
| Compute tuning                                               | Choice over number of nodes and their type                                                                                                                                                                | Configurable compute resources in cloud offering                                                                                                                                                                                                                                                                                                  |
| Storage format                                               | Columnar & compressed storage (RA3 nodes)                                                                                                                                                                 | Columnar, supports sorted, compressed, encoded & sparsely indexed files with native Apache Iceberg support.                                                                                                                                                                                                                                       |
| Table-level partition & pruning techniques                   | • No table partitions • User-defined distribution & sort keys are used to optimize for speed                                                                                                              | Partitioning by date/time and custom partitions with MergeTree indexes.                                                                                                                                                                                                                                                                           |
| Result cache                                                 | Yes                                                                                                                                                                                                       | Yes, results cache with TTL and query condition cache.                                                                                                                                                                                                                                                                                            |
| Warm cache (SSD)                                             | Only with RA3 nodes at partition-level granularity                                                                                                                                                        | Yes, at indexed data-range level granularity                                                                                                                                                                                                                                                                                                      |
| Support for semi-structured data & JSON functions within SQL | Limited                                                                                                                                                                                                   | Yes, including Lambda expressions and native JSON data type (GA in v25.3)                                                                                                                                                                                                                                                                         |
| Vector Search and AI Capabilities                            | Limited AI capabilities – primarily through integrations                                                                                                                                                  | • Native vector search capabilities and embeddings<br />• MCP Server for AI driven analytics<br />• Natural Language to SQL<br />• SQL based Inference                                                                                                                                                                                            |
| Query Optimizations                                          | • Basic query optimizer • Materialized views • Result caching • ANALYZE for table statistics • Workload management (WLM) • Automated materialized views (AutoMV) • AI-driven scaling (Serverless preview) | • Primary indexes (ORDER BY)<br />• Data skipping indexes (minmax, set, bloom filters, ngrambf\_v1, tokenbf\_v1)<br />• Materialized views<br />• Projections<br />• PREWHERE optimization<br />• Query analysis tools<br />• Automatic global join reordering (v25.9)<br />• Enhanced JSON query optimization<br />• Streaming secondary indices |

**Redshift** does provide a result cache for accelerating repetitive query workloads and also has more tuning options than some others. But it does not deliver much faster compute performance than other cloud data warehouses in benchmarks. Sort keys can be used to optimize performance, but their contribution is limited. There is no support for indexes, and low-latency analytics at large data volumes is hard to achieve. Because Redshift decoupling of storage & compute is limited compared to other cloud data warehouses, it doesn't support isolating workloads, which means performance can degrade under pressure and competition for resources.

**ClickHouse** is famous for being one of the fastest local runtimes ever built for OLAP workloads. Its columnar storage, compression and indexing capabilities make it a consistent leader in benchmarks. Its lack of support for standard SQL and lack of query optimizer means that it's less suitable for traditional BI workloads, and more suitable for engineering managed workloads. While fast, it requires a lot of tuning and optimization.

## Use cases [#use-cases]

There are a host of different analytics use cases that can be supported by a data warehouse. Look at your legacy technologies and their workloads, as well as the new possible use cases, and figure out which ones you will need to support in the next few years.

| Feature                                                                      | Redshift                                                                                                                                                                                                                                       | ClickHouse                                                                                                                                                                                                                                                                                                                                                                                                                           |
| ---------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| Low-latency dashboards                                                       | • Seconds to tens of seconds load times at 100s of GB scale • Can achieve faster performance with Concurrency Scaling and proper tuning                                                                                                        | • Sub-second load times at TB+ scale with proper indexing<br />• ClickHouse Cloud reduces engineering overhead with managed service • Proven low-latency performance (120ms at 2500 QPS in benchmarks)<br />• Purpose-built for low-latency OLAP and real-time analytics                                                                                                                                                             |
| Enterprise BI                                                                | • Mature and comprehensive Enterprise DW feature set • Extensive integrations with Enterprise BI ecosystem • Strong AWS ecosystem integration                                                                                                  | • Growing ecosystem with 50+ integrations including major BI tools<br />• Native MySQL protocol support enables broad BI tool compatibility • Strong SQL compliance with PostgreSQL compatibility<br />• Best suited for modern analytical workloads and engineering-managed use cases                                                                                                                                               |
| Data Apps and AI Applications (Customer-facing low-latency high concurrency) | • 5 concurrent queries per WLM queue by default (up to 8 queues) • Concurrency Scaling enables thousands of concurrent queries • Seconds-level response times typical • Automatic scaling for burst workloads • Limited AI application support | • Sub-second response times at TB+ scale<br />• Supports 1000 concurrent users per replica<br />• Strong price-performance on customer-facing applications<br />• Native vector search and embeddings                                                                                                                                                                                                                                |
| Ad hoc                                                                       | • Performance dependent on predefined distribution & sort keys • Elastic Resize enables adding compute resources • Typically subset of data loaded for ad-hoc analysis                                                                         | • Good for ad-hoc queries with ClickHouse Cloud's separated storage/compute architecture<br />• Join optimizations enable more query complexity<br />• Strong sampling capabilities (TABLESAMPLE) for exploratory analysis<br />• Resource management through user quotas prevents query interference<br />• Materialized views offer performance improvements for common aggregation patterns, ad-hoc users specify directly in SQL |

**Redshift** was originally designed to support traditional internal BI reporting and dashboard use cases for analysts. As such, it is typically used as a general-purpose Enterprise data warehouse. With deep integrations into the AWS ecosystem, it can also leverage AWS ML service, making it also useful for ML projects. However, given the coupling of storage & compute, and the difficulty in delivering low-latency analytics at scale, it is less suited for operational use cases and customer-facing use cases like Data Apps. The coupling of storage and compute, together with the need to predefine sort & dist keys for optimal performance, make it challenging to use for Ad-Hoc analytics.

**ClickHouse** was not designed to be a data warehouse, but rather a low-latency query execution runtime. Managing it typically requires significant engineering overhead. Hence, it's a good fit for engineering managed operational use cases and customer-facing data apps, where low latency matters. It is not a good fit for a general purpose data warehouse, nor for Ad-Hoc analytics or ELT.
