Athena vs Redshift

A detailed comparison

Compare Athena vs Redshift by the following set of categories:

Amazon Athena Amazon Redshift
Elasticity - separation of storage and compute Yes (query engine, no storage) Redshift RA3, Spectrum only
Supported cloud infrastructure AWS only AWS only
Isolated tenancy - option for dedicated resources Multi-tenant pooled resources Isolated tenant, use own VPC
Compute - node types N/A no choice Any size, and types
Data - internal/external, writable storage External only Internal writable, external (Spectrum)
Security - data Shared resources Isolated/VPC tenant for storage and compute, Encryption at rest, RBAC
Security - network Shared resources Firewall, SSL, PrivateLink whitelist/blacklist control, isolated/VPC tenancy

Athena is built on a decoupled storage and compute architecture, though it only provides and controls the compute part and does not manage ingestion or storage. It is also only on multi-tenant shared resources. If you are a Redshift customer you can use Redshift Spectrum, which is like dedicated Athena but not built on Presto, deployed on up to 10x the number of Redshift nodes in your own VPC, for the same price as Athena.

Redshift has the oldest architecture with the best options. It does not separate storage and compute. While it now has RA3 nodes which allow you to scale compute and only cache the data you need locally, all compute still operates together. You cannot separate workloads. While you can only run Redshift as an isolated workload on AWS, it has the most options on AWS, including the ability to deploy it in your own VPC.

Athena vs Redshift - Architecture

The biggest difference among cloud data warehouses are whether they separate storage and compute, how much they isolate data and compute, and what clouds they can run on.

Amazon Athena Amazon Redshift
Elasticity - (individual) query scalability Automatic (but shared resources) Manual
Elasticity - user (concurrent query) scalability Limited - 20 concurrent queries by default Autoscale up to 10 clusters, 15 queries per cluster, 50 queued queries total, max.
Write scalability - batch N/A 1 master cluster
Write scalability - continuous N/A (mostly batch-centric storage) Limited (table-level locking)
Data scalability Up to 100 partitions per table, 100 buckets (default) Only with RA3: 128 RA3 nodes, 8PB of data.

Athena is a shared multi-tenant resource, which means each account needs to be throttled to protect every other account’s performance. One customer was unable to handle any table or join above 5 billion rows. By default Athena supports a maximum of 20 concurrent users. If scalability is a top priority, Athena is probably the wrong choice.

Redshift is limited in scale because even with RA3, it cannot distribute different workloads across clusters. While it can scale to up to 10 clusters automatically to support query concurrency, it can only handle a maximum of 50 queued queries across all clusters by default. In addition, because it locks at the table level, it is better suited for batch ingestion and limited in its write throughput.

Athena vs Redshift - Scalability

There are three big differences among data warehouses and query engines that limit scalability: decoupled storage and compute, dedicated resources, and continuous ingestion.

Amazon Athena Amazon Redshift
Indexes None. None.
Query optimization - performance Limited cost-based optimization Limited cost-based optimization
Tuning No choice of resources Choice of (limited) node types
Storage format S3 Native Redshift storage (not Spectrum)
Ingestion performance N/A (storage and ingestion separate) Batch-centric (table-level locking)
Ingestion latency Not well suited for low latency visibility since unable to see as new values during ingestion Batch-centric (minute-level)
Partitioning Partition pruning Distribution, sort keys
Caching None Result cache
Semi-structured data - native JSON functions within SQL Yes (Lambda) Limited
Semi-structured data - native JSON storage type No No
Semi-structured data - performance Slow (full load into RAM, full scan) Slow (flattens JSON into table)

Athena, and Presto, should be the worst at performance, by design. The reason is that it sacrifices storage-compute optimization to get support for federated queries across multiple data sources. But there is a reason Presto is so popular. Even with that handicap, Presto and Athena do very well. Presto can come close to Redshift and Snowflake in performance when both Presto and the external storage is managed by experts. But there is no support for indexing. Specifically with Athena, you cannot guarantee performance as a shared multi-tenant resource. In general, if performance is a top concern and you can bring data together via a data pipeline and optimize data with compute, then Athena or Presto are not the best choice.

Redshift does provide a result cache for accelerating repetitive query workloads and also has more tuning options than some others. But it does not deliver much faster compute performance than other cloud data warehouses in benchmarks. While its storage access is more efficient, with smaller data block sizes being fetched over the network, it does not perform a lot of query optimization, and has no support for indexes. It also has less support for semi-structured data or low-latency ingestion at any reasonable scale.

Athena vs Redshift - Performance

Performance is the biggest challenge with most data warehouses today.
While decoupled storage and compute architectures improved scalability and simplified administration, for most data warehouses it introduced two bottlenecks; storage, and compute. Most modern cloud data warehouses fetch entire partitions over the network instead of just fetching the specific data needed for each query. While many invest in caching, most do not invest heavily in query optimization. Most vendors also have not improved continuous ingestion or semi-structured data analytics performance, both of which are needed for operational and customer-facing use cases.

Amazon Athena Amazon Redshift
Reporting Yes Yes
Dashboards Fixed view Fixed view
Ad hoc Sec-min first-time query performance Sec-min first-time query performance
Operational or customer-facing analytics (high concurrency, continuously updating / streaming data) Slow query performanceand limited scale. Limited continuous writes and concurrency, slow semi- structured data performance Slower query performance. Limited continuous writes and concurrency, limited semi- structured data support
Data processing engine (Exports or publishes data) Exports query results Unload data as Parquet
Data science/ML Export query results Invoke ML (SageMaker) in SQL

Athena is one of the best “one-off” query engines; all you have to do is provide the data and pay $5 a TB. If you need to quickly pull together multiple data sources, it’s a great option. Redshift Spectrum is a great add-on option for Redshift for federated queries. But if you don’t need federated queries, need performance, and need anything other than one-off or occasional analytics, Athena is not a good option for any of these use cases. There is no data, network or query optimization, no indexing beyond pruning indexes like others.

Redshift was originally designed to support traditional internal BI reporting and dashboard use cases for analysts. Without second-level performance, it cannot support any interactive and ad hoc analytics. It also has a limit of 50 queued queries by default, which limits concurrency, and a lack of support for continuous ingestion. All of these limitations mean Redshift for operational and customer-facing use cases.

Athena vs Redshift - Use cases

There are a host of different analytics use cases that can be supported by a data warehouse. Look at your legacy technologies and their workloads, as well as the new possible use cases, and figure out which ones you will need to support in the next few years. They include:

Reporting where relatively static reports are created by analysts against historical data, and used by executives, managers, and now increasingly by employees and customers

Dashboards created by analysts against historical or live data, and used by executives, managers, and increasingly by employees and customers via Web-based applications

Interactive and ad hoc analytics within dashboards or other tools for on-the-fly interactive analysis either by expert analysts, or increasingly by employees and customers via self-service

High performance analytics that require very large or complex queries with sub-second performance.

Big data analytics using semi-structured or unstructured data and complex queries or functionality

Operational and customer-facing analytics built by development teams that deliver historical and live data and analytics to larger groups of employees and customers

Amazon Athena Amazon Redshift
Administration - deployment, management No administration or tuning Easy to provision, harder to configure and manage
Choice - provision different cluster types on same data No No
Choice - provision different number of nodes No Yes
Choice - provision different node types No Yes(Limited)
Pricing - compute None $0.25-13 per node on demand, 40% less for up front
Pricing - storage N/A (not part of Athena) Stored data only. RA3: $24 per TB per month. S3: AWS S3 costs.
Pricing - transferred data $5 per TB scanned (10MB per query) Spectrum: $5 per TB scanned (10MB min per query)

Athena is arguably the easiest, least expensive and best suited for “one-off analytics”. But it is also the most limited, and requires you to manage your own (external) storage and ingestion very well, which is especially hard for continuous ingestion. This makes Athena the least-suited for any ongoing, frequent use case.

Redshift, while it is arguably the most mature and feature-rich, is also the most like a traditional data warehouse in its limitations. This makes it the hardest to manage, and costly overall for traditional reporting and dashboards, and not as well suited for the newer use cases.

Athena vs Redshift - Cost

This is perhaps the strangest, and yet the clearest comparison; cost. There are a lot of differences in the details, but at a high level, the main differences should be clear.

Compare other data warehouses

See all data warehouse comparisons ->

Talk to a Firebolt solution architect