# Athena vs Redshift (/comparison/athena-vs-redshift)


<Video id="B4vtzLbf14U" />

## Architecture [#architecture]

The biggest difference among cloud data warehouses are whether they separate storage and compute, how much they isolate data and compute, and what clouds they can run on.

| Feature                                           | Athena                                                                                                                                                                                                                             | Redshift                                                                                                           |
| ------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------ |
| Separation of storage and compute                 | Yes, serverless with optional provisioned capacity. Workloads can be isolated through Workgroups and Capacity Reservations                                                                                                         | RA3 instances enable separation of compute and storage, but limited workload isolation compared to other platforms |
| Supported cloud infrastructure                    | AWS only                                                                                                                                                                                                                           | AWS only                                                                                                           |
| Isolated tenancy – option for dedicated resources | • Multi-tenant pooled resources by default<br />• Dedicated compute resources available via Provisioned Capacity<br />• VPC endpoint connections supported                                                                         | • Isolated tenant & resources • Runs in your VPC                                                                   |
| Control vs abstraction of compute                 | • Serverless by default with no infrastructure control<br />• Optional Provisioned Capacity allows dedicated DPU allocation (minimum 24 DPUs)<br />• Two pricing models: on-demand ($5/TB scanned) or provisioned ($0.30/DPU-hour) | • Configurable cluster size<br />• Configurable compute types                                                      |
| Self-hosted and hybrid deployment options         | No self-hosted options – serverless only                                                                                                                                                                                           | Limited hybrid options with Redshift Serverless                                                                    |
| ACID Compliance and Transactions                  | No ACID compliance – eventual consistency model                                                                                                                                                                                    | ACID compliant at table level with some limitations on concurrent operations                                       |

**Athena** is serverless and built on a decoupled storage and compute architecture that queries data directly in S3, without the need to ingest/copy the data. It runs in multi-tenancy with shared resources. Users do not have control over the compute resources Athena chooses to allocate per query from the shared resource pool. For folks requiring additional or dedicated resources, they can reserve dedicated processing capacity in the form of Data Processing Units (DPU), with each DPU providing 4 vCPU and 16 GB RAM. RPU allocation ranges from 24 - 1000 per region.

**Redshift** has the oldest architecture, being the first Cloud DW in the group. Its architecture wasn't designed to separate storage & compute. While it now has RA3 nodes which allow you to scale compute and only cache the data you need locally, all compute still operates together. You cannot separate and isolate different workloads over the same data, which puts it behind other decoupled storage/compute architectures. Redshift runs as an isolated tenant per customer, and unlike other cloud data warehouses, it is deployed in your VPC. Redshift offers a serverless option which is based on an abstracted unit called Redshift Processing Unit (RPU) ranging from 8 to 512 in increments of 8. Each RPU provides 2 vCPU and 16GB RAM. Thus, 8 RPU is equivalent to 16 vCPU / 128GB RAM. The minimum RPU is 8.

## Scalability [#scalability]

There are three big differences among data warehouses and query engines that limit scalability: decoupled storage and compute, dedicated resources, and continuous ingestion.

| Feature                                                         | Athena                                                                                                                                                                                                                    | Redshift                                                                                                                       |
| --------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------ |
| Elasticity – Scaling for larger data volumes and faster queries | • Fully abstracted on-demand scaling • Provisioned Capacity allows manual scaling of DPUs for predictable performance • Capacity reservations can be adjusted with minimum 1-hour billing periods                         | Available via Elastic Resize – slow and limited, downtime required                                                             |
| Elasticity – Scaling for higher concurrency                     | • Default limit of 25 concurrent DML queries and 20 DDL queries (adjustable via service quotas) • Provisioned Capacity enables higher concurrency with dedicated DPUs • Query queuing available when capacity is exceeded | • 5 concurrent queries per WLM queue by default (up to 8 queues) • Concurrency Scaling enables thousands of concurrent queries |

**Athena** is a shared multi-tenant resource, with no guarantees on the amount or availability of the resources allocated for your queries. From a data volume perspective, it can scale to large volumes, but large data volumes can suffer from very long run times and frequent time outs. Query concurrency is maxed at 20. If scalability is a top priority, Athena is probably not the best choice.

**Redshift** is limited in scale because even with RA3, it cannot distribute different workloads across clusters. While it can scale to up to 10 clusters automatically to support query concurrency, it can only handle a maximum of 50 queued queries across all clusters by default.

## Performance [#performance]

Performance is the biggest challenge with most data warehouses today.
While decoupled storage and compute architectures improved scalability and simplified administration, for most data warehouses it introduced two bottlenecks; storage, and compute. Most modern cloud data warehouses fetch entire partitions over the network instead of just fetching the specific data needed for each query. While many invest in caching, most do not invest heavily in query optimization. Most vendors also have not improved continuous ingestion or semi-structured data analytics performance, both of which are needed for operational and customer-facing use cases.

| Feature                                                      | Athena                                                                                                                                                                                                                                                                                                                                         | Redshift                                                                                                                                                                                                  |
| ------------------------------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Indexes                                                      | No traditional indexes – relies on partition pruning and data organization in S3. Uses columnar formats and compression for optimization                                                                                                                                                                                                       | None                                                                                                                                                                                                      |
| Compute tuning                                               | • No compute tuning in on-demand mode • Provisioned Capacity allows DPU allocation control (4 vCPU and 16GB RAM per DPU) • Minimum 24 DPUs with scaling in 4-DPU increments                                                                                                                                                                    | Choice over number of nodes and their type                                                                                                                                                                |
| Storage format                                               | Supports multiple formats: Parquet, ORC, Avro, JSON, CSV, TSV on S3. Native support for open table formats including Apache Iceberg, Apache Hudi, and Delta Lake                                                                                                                                                                               | Columnar & compressed storage (RA3 nodes)                                                                                                                                                                 |
| Table-level partition & pruning techniques                   | • User-defined table-level partitions with Hive-style partitioning • Pruning at partition level • Partition projection for advanced performance optimization • Supports open table formats with built-in partitioning                                                                                                                          | • No table partitions • User-defined distribution & sort keys are used to optimize for speed                                                                                                              |
| Result cache                                                 | Query result caching for up to 30 days with configurable retention. Results reuse supported across workgroups                                                                                                                                                                                                                                  | Yes                                                                                                                                                                                                       |
| Warm cache (SSD)                                             | No local caching – queries data directly from S3. Relies on S3's performance characteristics and intelligent tiering                                                                                                                                                                                                                           | Only with RA3 nodes at partition-level granularity                                                                                                                                                        |
| Support for semi-structured data & JSON functions within SQL | Yes, comprehensive JSON support including Lambda expressions, array functions, and native nested data handling                                                                                                                                                                                                                                 | Limited                                                                                                                                                                                                   |
| Vector Search and AI Capabilities                            | No native AI or vector search capabilities                                                                                                                                                                                                                                                                                                     | Limited AI capabilities – primarily through integrations                                                                                                                                                  |
| Query Optimizations                                          | • Cost-based optimizer (CBO) in Athena engine v3 • Query result caching (up to 30 days) • Partition projection for advanced optimization • CTAS for precomputed queries • Join reordering and aggregation pushdown • Automatic parallel query execution • Support for columnar formats (Parquet, ORC) • Integration with AWS Glue Data Catalog | • Basic query optimizer • Materialized views • Result caching • ANALYZE for table statistics • Workload management (WLM) • Automated materialized views (AutoMV) • AI-driven scaling (Serverless preview) |

**Athena**, (and Presto) are designed to query data where it is, sacrificing storage-compute optimizations. This makes it very convenient for easy and immediate querying but at the expense of performance. This typically puts Athena behind cloud data warehouses in terms of performance. But Athena still does relatively well in performance benchmarks, especially when external storage is managed by experts. While it supports partitions, there is no support for indexing, and together with the fact that resources are pooled from a shared multi-tenant service, low-latency and consistent performance are not Athena's sweet spot. A cloud data warehouse be more performant better than Athena in most cases.

**Redshift** does provide a result cache for accelerating repetitive query workloads and also has more tuning options than some others. But it does not deliver much faster compute performance than other cloud data warehouses in benchmarks. Sort keys can be used to optimize performance, but their contribution is limited. There is no support for indexes, and low-latency analytics at large data volumes is hard to achieve. Because Redshift decoupling of storage & compute is limited compared to other cloud data warehouses, it doesn't support isolating workloads, which means performance can degrade under pressure and competition for resources

## Use cases [#use-cases]

There are a host of different analytics use cases that can be supported by a data warehouse. Look at your legacy technologies and their workloads, as well as the new possible use cases, and figure out which ones you will need to support in the next few years.

| Feature                                                                      | Athena                                                                                                                                                                                                                                                                                                                                                                        | Redshift                                                                                                                                                                                                                                       |
| ---------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Low-latency dashboards                                                       | • Seconds to minutes response times for interactive dashboards • Performance varies based on data partitioning, file formats, and query optimization • Provisioned Capacity can improve consistency for dashboard workloads • Best suited for analytical dashboards rather than sub-second operational dashboards                                                             | • Seconds to tens of seconds load times at 100s of GB scale • Can achieve faster performance with Concurrency Scaling and proper tuning                                                                                                        |
| Enterprise BI                                                                | • Good integration with AWS ecosystem BI tools (QuickSight, etc.) • Standard SQL compatibility enables most BI tool connections • Cost-effective for variable workloads and ad-hoc analytics • JDBC/ODBC drivers support enterprise BI tools • Limited advanced BI features compared to dedicated data warehouses                                                             | • Mature and comprehensive Enterprise DW feature set • Extensive integrations with Enterprise BI ecosystem • Strong AWS ecosystem integration                                                                                                  |
| Data Apps and AI Applications (Customer-facing low-latency high concurrency) | • Default concurrency limits (25 DML/20 DDL queries) may require service quota increases • Provisioned Capacity enables higher concurrency with dedicated resources • Seconds-level response times typical • Cost-effective for customer-facing analytics with proper optimization • Best suited for analytical rather than operational workloads • No native AI capabilities | • 5 concurrent queries per WLM queue by default (up to 8 queues) • Concurrency Scaling enables thousands of concurrent queries • Seconds-level response times typical • Automatic scaling for burst workloads • Limited AI application support |
| Ad hoc                                                                       | • Purpose-built for ad-hoc analytics on data lakes • Serverless with zero infrastructure management • Direct querying of S3 data without ETL • Cost-effective pay-per-query model ideal for exploratory analysis • Strong support for multiple data formats and federated queries • Apache Spark integration for advanced analytics                                           | • Performance dependent on predefined distribution & sort keys • Elastic Resize enables adding compute resources • Typically subset of data loaded for ad-hoc analysis                                                                         |

**Athena** is a great choice for Ad-Hoc analytics. You can keep the data where it is, and start querying without worrying about hardware or pretty much anything else, given that Athena is serverless and takes care of everything behind the scenes. However, it is not a great fit when you need consistent and fast query performance, and/or high concurrency. This is why it is typically not the best choice for operational and customer-facing applications. It can be also easily and flexibly used for batch processing, which is often leveraged for ML use cases.

**Redshift** was originally designed to support traditional internal BI reporting and dashboard use cases for analysts. As such, it is typically used as a general-purpose Enterprise data warehouse. With deep integrations into the AWS ecosystem, it can also leverage AWS ML service, making it also useful for ML projects. However, given the coupling of storage & compute, and the difficulty in delivering low-latency analytics at scale, it is less suited for operational use cases and customer-facing use cases like Data Apps. The coupling of storage and compute, together with the need to predefine sort & dist keys for optimal performance, make it challenging to use for Ad-Hoc analytics.