Skip to main content

Redshift vs ClickHouse

A Detailed Cloud Data Warehouse Comparison

ON THIS PAGE

## Architecture

The biggest difference among cloud data warehouses are whether they separate storage and compute, how much they isolate data and compute, and what clouds they can run on.

FeatureRedshiftClickHouse
Separation of storage and computeRA3 instances enable separation of compute and storage, but limited workload isolation compared to other platformsYes – SharedMergeTree engine in ClickHouse Cloud enables full separation of storage and compute, with compute-compute separation through Warehouses feature (introduced 2025) allowing multiple isolated compute services sharing the same data
Supported cloud infrastructureAWS onlyAWS, GCP, Azure, cloud service and on-premises
Isolated tenancy – option for dedicated resources• Isolated tenant & resources • Runs in your VPC• Multi-tenant metadata layer
• Isolated tenancy for compute & storage per client in cloud
Control vs abstraction of compute• Configurable cluster size
• Configurable compute types
Configurable cluster size and compute types in ClickHouse Cloud with granular control over nodes (1-128 nodes) and node characteristics. Warehouses feature enables multiple isolated read-only compute environments.
Self-hosted and hybrid deployment optionsLimited hybrid options with Redshift ServerlessSelf-managed deployments available with full control over infrastructure
ACID Compliance and TransactionsACID compliant at table level with some limitations on concurrent operationsLimited ACID compliance with MergeTree engine family.

Redshift has the oldest architecture, being the first Cloud DW in the group. Its architecture wasn't designed to separate storage & compute. While it now has RA3 nodes which allow you to scale compute and only cache the data you need locally, all compute still operates together. You cannot separate and isolate different workloads over the same data, which puts it behind other decoupled storage/compute architectures. Redshift runs as an isolated tenant per customer, and unlike other cloud data warehouses, it is deployed in your VPC. Redshift offers a serverless option which is based on an abstracted unit called Redshift Processing Unit (RPU) ranging from 8 to 512 in increments of 8. Each RPU provides 2 vCPU and 16GB RAM. Thus, 8 RPU is equivalent to 16 vCPU / 128GB RAM. The minimum RPU is 8.

ClickHouse was originally developed at Yandex, the Russian search engine, as an OLAP engine for low latency analytics. It was built as an on-premise solution with coupled storage & compute, and a large variety of tuning options in the form of indexes and and merge trees. ClickHouse's architecture is famous for its focus on performance and low-latency queries. The tradeoff is that it is considered very difficult to work with. SQL support is very limited, and tuning/running it requires significant engineering resources.

## Scalability

There are three big differences among data warehouses and query engines that limit scalability: decoupled storage and compute, dedicated resources, and continuous ingestion.

| Feature | | | --- | | | Elasticity – Scaling for larger data volumes and faster queries | | | Elasticity – Scaling for higher concurrency | |

Redshift is limited in scale because even with RA3, it cannot distribute different workloads across clusters. While it can scale to up to 10 clusters automatically to support query concurrency, it can only handle a maximum of 50 queued queries across all clusters by default.

ClickHouse doesn't offer any dedicated scaling features or mechanisms. While it can deliver linearly scalable performance for some types of queries, scaling itself has to be done manually. Hardware is self-managed in ClickHouse. This means that to scale you would have to provision a cluster and migrate.

## Performance

Performance is the biggest challenge with most data warehouses today. While decoupled storage and compute architectures improved scalability and simplified administration, for most data warehouses it introduced two bottlenecks; storage, and compute. Most modern cloud data warehouses fetch entire partitions over the network instead of just fetching the specific data needed for each query. While many invest in caching, most do not invest heavily in query optimization. Most vendors also have not improved continuous ingestion or semi-structured data analytics performance, both of which are needed for operational and customer-facing use cases.

FeatureRedshiftClickHouse
IndexesNone• Primary indexes • Skipping indexes (minmax, set, bloom filters, ngrambf_v1, tokenbf_v1)
• MergeTree indexes
• Incremental Materialized views
Compute tuningChoice over number of nodes and their typeConfigurable compute resources in cloud offering
Storage formatColumnar & compressed storage (RA3 nodes)Columnar, supports sorted, compressed, encoded & sparsely indexed files with native Apache Iceberg support.
Table-level partition & pruning techniques• No table partitions • User-defined distribution & sort keys are used to optimize for speedPartitioning by date/time and custom partitions with MergeTree indexes.
Result cacheYesYes, results cache with TTL and query condition cache.
Warm cache (SSD)Only with RA3 nodes at partition-level granularityYes, at indexed data-range level granularity
Support for semi-structured data & JSON functions within SQLLimitedYes, including Lambda expressions and native JSON data type (GA in v25.3)
Vector Search and AI CapabilitiesLimited AI capabilities – primarily through integrations• Native vector search capabilities and embeddings
• MCP Server for AI driven analytics
• Natural Language to SQL
• SQL based Inference
Query Optimizations• Basic query optimizer • Materialized views • Result caching • ANALYZE for table statistics • Workload management (WLM) • Automated materialized views (AutoMV) • AI-driven scaling (Serverless preview)• Primary indexes (ORDER BY)
• Data skipping indexes (minmax, set, bloom filters, ngrambf_v1, tokenbf_v1)
• Materialized views
• Projections
• PREWHERE optimization
• Query analysis tools
• Automatic global join reordering (v25.9)
• Enhanced JSON query optimization
• Streaming secondary indices

Redshift does provide a result cache for accelerating repetitive query workloads and also has more tuning options than some others. But it does not deliver much faster compute performance than other cloud data warehouses in benchmarks. Sort keys can be used to optimize performance, but their contribution is limited. There is no support for indexes, and low-latency analytics at large data volumes is hard to achieve. Because Redshift decoupling of storage & compute is limited compared to other cloud data warehouses, it doesn't support isolating workloads, which means performance can degrade under pressure and competition for resources.

ClickHouse is famous for being one of the fastest local runtimes ever built for OLAP workloads. Its columnar storage, compression and indexing capabilities make it a consistent leader in benchmarks. Its lack of support for standard SQL and lack of query optimizer means that it's less suitable for traditional BI workloads, and more suitable for engineering managed workloads. While fast, it requires a lot of tuning and optimization.

## Use cases

There are a host of different analytics use cases that can be supported by a data warehouse. Look at your legacy technologies and their workloads, as well as the new possible use cases, and figure out which ones you will need to support in the next few years.

FeatureRedshiftClickHouse
Low-latency dashboards• Seconds to tens of seconds load times at 100s of GB scale • Can achieve faster performance with Concurrency Scaling and proper tuning• Sub-second load times at TB+ scale with proper indexing
• ClickHouse Cloud reduces engineering overhead with managed service • Proven low-latency performance (120ms at 2500 QPS in benchmarks)
• Purpose-built for low-latency OLAP and real-time analytics
Enterprise BI• Mature and comprehensive Enterprise DW feature set • Extensive integrations with Enterprise BI ecosystem • Strong AWS ecosystem integration• Growing ecosystem with 50+ integrations including major BI tools
• Native MySQL protocol support enables broad BI tool compatibility • Strong SQL compliance with PostgreSQL compatibility
• Best suited for modern analytical workloads and engineering-managed use cases
Data Apps and AI Applications (Customer-facing low-latency high concurrency)• 5 concurrent queries per WLM queue by default (up to 8 queues) • Concurrency Scaling enables thousands of concurrent queries • Seconds-level response times typical • Automatic scaling for burst workloads • Limited AI application support• Sub-second response times at TB+ scale
• Supports 1000 concurrent users per replica
• Strong price-performance on customer-facing applications
• Native vector search and embeddings
Ad hoc• Performance dependent on predefined distribution & sort keys • Elastic Resize enables adding compute resources • Typically subset of data loaded for ad-hoc analysis• Good for ad-hoc queries with ClickHouse Cloud's separated storage/compute architecture
• Join optimizations enable more query complexity
• Strong sampling capabilities (TABLESAMPLE) for exploratory analysis
• Resource management through user quotas prevents query interference
• Materialized views offer performance improvements for common aggregation patterns, ad-hoc users specify directly in SQL

Redshift was originally designed to support traditional internal BI reporting and dashboard use cases for analysts. As such, it is typically used as a general-purpose Enterprise data warehouse. With deep integrations into the AWS ecosystem, it can also leverage AWS ML service, making it also useful for ML projects. However, given the coupling of storage & compute, and the difficulty in delivering low-latency analytics at scale, it is less suited for operational use cases and customer-facing use cases like Data Apps. The coupling of storage and compute, together with the need to predefine sort & dist keys for optimal performance, make it challenging to use for Ad-Hoc analytics.

ClickHouse was not designed to be a data warehouse, but rather a low-latency query execution runtime. Managing it typically requires significant engineering overhead. Hence, it's a good fit for engineering managed operational use cases and customer-facing data apps, where low latency matters. It is not a good fit for a general purpose data warehouse, nor for Ad-Hoc analytics or ELT.