ON THIS PAGE
## Architecture
The biggest difference among cloud data warehouses are whether they separate storage and compute, how much they isolate data and compute, and what clouds they can run on.
| Feature | ClickHouse | Athena |
|---|---|---|
| Separation of storage and compute | Yes – SharedMergeTree engine in ClickHouse Cloud enables full separation of storage and compute, with compute-compute separation through Warehouses feature (introduced 2025) allowing multiple isolated compute services sharing the same data | Yes, serverless with optional provisioned capacity. Workloads can be isolated through Workgroups and Capacity Reservations |
| Supported cloud infrastructure | AWS, GCP, Azure, cloud service and on-premises | AWS only |
| Isolated tenancy – option for dedicated resources | • Multi-tenant metadata layer • Isolated tenancy for compute & storage per client in cloud | • Multi-tenant pooled resources by default • Dedicated compute resources available via Provisioned Capacity • VPC endpoint connections supported |
| Control vs abstraction of compute | Configurable cluster size and compute types in ClickHouse Cloud with granular control over nodes (1-128 nodes) and node characteristics. Warehouses feature enables multiple isolated read-only compute environments. | • Serverless by default with no infrastructure control • Optional Provisioned Capacity allows dedicated DPU allocation (minimum 24 DPUs) • Two pricing models: on-demand ($5/TB scanned) or provisioned ($0.30/DPU-hour) |
| Self-hosted and hybrid deployment options | Self-managed deployments available with full control over infrastructure | No self-hosted options – serverless only |
| ACID Compliance and Transactions | Limited ACID compliance with MergeTree engine family. | No ACID compliance – eventual consistency model |
ClickHouse was originally developed at Yandex, the Russian search engine, as an OLAP engine for low latency analytics. It was built as an on-premise solution with coupled storage & compute, and a large variety of tuning options in the form of indexes and and merge trees. ClickHouse's architecture is famous for its focus on performance and low-latency queries. The tradeoff is that it is considered very difficult to work with. SQL support is very limited, and tuning/running it requires significant engineering resources.
Athena is serverless and built on a decoupled storage and compute architecture that queries data directly in S3, without the need to ingest/copy the data. It runs in multi-tenancy with shared resources. Users do not have control over the compute resources Athena chooses to allocate per query from the shared resource pool. For folks requiring additional or dedicated resources, they can reserve dedicated processing capacity in the form of Data Processing Units (DPU), with each DPU providing 4 vCPU and 16 GB RAM. RPU allocation ranges from 24 - 1000 per region.
## Scalability
There are three big differences among data warehouses and query engines that limit scalability: decoupled storage and compute, dedicated resources, and continuous ingestion.
| Feature | ClickHouse | Athena |
|---|---|---|
| Elasticity – Scaling for larger data volumes and faster queries | Automatic horizontal and vertical scaling in ClickHouse Cloud with SharedMergeTree architecture. Manual scaling for self-managed deployments with cluster rebalancing capabilities | • Fully abstracted on-demand scaling • Provisioned Capacity allows manual scaling of DPUs for predictable performance • Capacity reservations can be adjusted with minimum 1-hour billing periods |
| Elasticity – Scaling for higher concurrency | Supports high concurrency with proper resource allocation and configuration. Vertical auto-scaling and horizontal manual scaling. Additional warehouses can idle to zero billing. Primary service always on in multi-warehouse configurations. | • Default limit of 25 concurrent DML queries and 20 DDL queries (adjustable via service quotas) • Provisioned Capacity enables higher concurrency with dedicated DPUs • Query queuing available when capacity is exceeded |
ClickHouse doesn't offer any dedicated scaling features or mechanisms. While it can deliver linearly scalable performance for some types of queries, scaling itself has to be done manually. Hardware is self-managed in ClickHouse. This means that to scale you would have to provision a cluster and migrate.
Athena is a shared multi-tenant resource, with no guarantees on the amount or availability of the resources allocated for your queries. From a data volume perspective, it can scale to large volumes, but large data volumes can suffer from very long run times and frequent time outs. Query concurrency is maxed at 20. If scalability is a top priority, Athena is probably not the best choice.
## Performance
Performance is the biggest challenge with most data warehouses today. While decoupled storage and compute architectures improved scalability and simplified administration, for most data warehouses it introduced two bottlenecks; storage, and compute. Most modern cloud data warehouses fetch entire partitions over the network instead of just fetching the specific data needed for each query. While many invest in caching, most do not invest heavily in query optimization. Most vendors also have not improved continuous ingestion or semi-structured data analytics performance, both of which are needed for operational and customer-facing use cases.
| Feature | ClickHouse | Athena |
|---|---|---|
| Indexes | • Primary indexes • Skipping indexes (minmax, set, bloom filters, ngrambf_v1, tokenbf_v1) • MergeTree indexes • Incremental Materialized views | No traditional indexes – relies on partition pruning and data organization in S3. Uses columnar formats and compression for optimization |
| Compute tuning | Configurable compute resources in cloud offering | • No compute tuning in on-demand mode • Provisioned Capacity allows DPU allocation control (4 vCPU and 16GB RAM per DPU) • Minimum 24 DPUs with scaling in 4-DPU increments |
| Storage format | Columnar, supports sorted, compressed, encoded & sparsely indexed files with native Apache Iceberg support. | Supports multiple formats: Parquet, ORC, Avro, JSON, CSV, TSV on S3. Native support for open table formats including Apache Iceberg, Apache Hudi, and Delta Lake |
| Table-level partition & pruning techniques | Partitioning by date/time and custom partitions with MergeTree indexes. | • User-defined table-level partitions with Hive-style partitioning • Pruning at partition level • Partition projection for advanced performance optimization • Supports open table formats with built-in partitioning |
| Result cache | Yes, results cache with TTL and query condition cache. | Query result caching for up to 30 days with configurable retention. Results reuse supported across workgroups |
| Warm cache (SSD) | Yes, at indexed data-range level granularity | No local caching – queries data directly from S3. Relies on S3's performance characteristics and intelligent tiering |
| Support for semi-structured data & JSON functions within SQL | Yes, including Lambda expressions and native JSON data type (GA in v25.3) | Yes, comprehensive JSON support including Lambda expressions, array functions, and native nested data handling |
| Vector Search and AI Capabilities | • Native vector search capabilities and embeddings • MCP Server for AI driven analytics • Natural Language to SQL • SQL based Inference | No native AI or vector search capabilities |
| Query Optimizations | • Primary indexes (ORDER BY) • Data skipping indexes (minmax, set, bloom filters, ngrambf_v1, tokenbf_v1) • Materialized views • Projections • PREWHERE optimization • Query analysis tools • Automatic global join reordering (v25.9) • Enhanced JSON query optimization • Streaming secondary indices | • Cost-based optimizer (CBO) in Athena engine v3 • Query result caching (up to 30 days) • Partition projection for advanced optimization • CTAS for precomputed queries • Join reordering and aggregation pushdown • Automatic parallel query execution • Support for columnar formats (Parquet, ORC) • Integration with AWS Glue Data Catalog |
ClickHouse is famous for being one of the fastest local runtimes ever built for OLAP workloads. Its columnar storage, compression and indexing capabilities make it a consistent leader in benchmarks. Its lack of support for standard SQL and lack of query optimizer means that it's less suitable for traditional BI workloads, and more suitable for engineering managed workloads. While fast, it requires a lot of tuning and optimization.
Athena (and Presto) are designed to query data where it is, sacrificing storage-compute optimizations. This makes it very convenient for easy and immediate querying but at the expense of performance. This typically puts Athena behind cloud data warehouses in terms of performance. But Athena still does relatively well in performance benchmarks, especially when external storage is managed by experts. While it supports partitions, there is no support for indexing, and together with the fact that resources are pooled from a shared multi-tenant service, low-latency and consistent performance are not Athena's sweet spot. A cloud data warehouse be more performant better than Athena in most cases.
## Use cases
There are a host of different analytics use cases that can be supported by a data warehouse. Look at your legacy technologies and their workloads, as well as the new possible use cases, and figure out which ones you will need to support in the next few years.
| Feature | ClickHouse | Athena |
|---|---|---|
| Low-latency dashboards | • Sub-second load times at TB+ scale with proper indexing • ClickHouse Cloud reduces engineering overhead with managed service • Proven low-latency performance (120ms at 2500 QPS in benchmarks) • Purpose-built for low-latency OLAP and real-time analytics | • Seconds to minutes response times for interactive dashboards • Performance varies based on data partitioning, file formats, and query optimization • Provisioned Capacity can improve consistency for dashboard workloads • Best suited for analytical dashboards rather than sub-second operational dashboards |
| Enterprise BI | • Growing ecosystem with 50+ integrations including major BI tools • Native MySQL protocol support enables broad BI tool compatibility • Strong SQL compliance with PostgreSQL compatibility • Best suited for modern analytical workloads and engineering-managed use cases | • Good integration with AWS ecosystem BI tools (QuickSight, etc.) • Standard SQL compatibility enables most BI tool connections • Cost-effective for variable workloads and ad-hoc analytics • JDBC/ODBC drivers support enterprise BI tools • Limited advanced BI features compared to dedicated data warehouses |
| Data Apps and AI Applications (Customer-facing low-latency high concurrency) | • Sub-second response times at TB+ scale • Supports 1000 concurrent users per replica • Strong price-performance on customer-facing applications • Native vector search and embeddings | • Default concurrency limits (25 DML/20 DDL queries) may require service quota increases • Provisioned Capacity enables higher concurrency with dedicated resources • Seconds-level response times typical • Cost-effective for customer-facing analytics with proper optimization • Best suited for analytical rather than operational workloads • No native AI capabilities |
| Ad hoc | • Good for ad-hoc queries with ClickHouse Cloud's separated storage/compute architecture • Join optimizations enable more query complexity • Strong sampling capabilities (TABLESAMPLE) for exploratory analysis • Resource management through user quotas prevents query interference • Materialized views offer performance improvements for common aggregation patterns, ad-hoc users specify directly in SQL | • Purpose-built for ad-hoc analytics on data lakes • Serverless with zero infrastructure management • Direct querying of S3 data without ETL • Cost-effective pay-per-query model ideal for exploratory analysis • Strong support for multiple data formats and federated queries • Apache Spark integration for advanced analytics |
ClickHouse was not designed to be a data warehouse, but rather a low-latency query execution runtime. Managing it typically requires significant engineering overhead. Hence, it's a good fit for engineering managed operational use cases and customer-facing data apps, where low latency matters. It is not a good fit for a general purpose data warehouse, nor for Ad-Hoc analytics or ELT.
Athena is a great choice for Ad-Hoc analytics. You can keep the data where it is, and start querying without worrying about hardware or pretty much anything else, given that Athena is serverless and takes care of everything behind the scenes. However, it is not a great fit when you need consistent and fast query performance, and/or high concurrency. This is why it is typically not the best choice for operational and customer-facing applications. It can be also easily and flexibly used for batch processing, which is often leveraged for ML use cases.