WHITEPAPER

A Comparison of Data Warehouse and Query Engines on AWS

Choosing the right data warehouse and analytics infrastructure on Amazon Web Services (AWS) can be confusing. Part of the problem is understanding the differences between the older on-premises warehouses and three generations of cloud data warehouse and query engine technologies that have all been evolving over time.

Comparison - Redshift, Athena, Snowflake, Firebolt

What follows is a side-by-side comparison of the options on AWS across 1st, 2nd, and 3rd generation data warehouses and query engines.

1st generation
Scalability
Architecture
2nd generation
3rd generation
2nd generation
Query scalability
Elasticity - separation of storage and compute
Supported cloud infrastructure
Isolated tenancy
User scalability (automatic or manual)
User concurrency (maximum concurrency)
Write scalability (batch, continuous, upserts)
Performance
Indexing
Query optimization
Storage and partitioning
Semi-structured native data storage, functions, performance
Use cases
COST
Reporting and dashboards
Administration
Price
Ad hoc, interactive analytics
Operations, customer facing
Choice of resources
Limited - 15 queries per cluster, 50 queued queries total
Limited - 20 concurrent queries by default
Up to 10 same-size (1-128 node) warehouses
Unlimited user concurrency
Manual
Only RA3, Spectrum
AWS only
AWS only
AWS only
AWS, Azure, Google Cloud
Automatic
Auto-scaling but shared resources
Yes (query engine only)
Yes
Yes
No
VPS only
Yes
Yes
Automatic but limited
Manual 1-click resize to larger warehouse
Automatic
1-click resize to larger EC2 type, number nodes
Automate with scripting
Limited - 1 cluster with table locking, slow upserts
N/A - better suited for columnar (read-only)
Limited - batch-centric, slow updates
Strong - batch, and continuous, fast updates
None
None
None
Sparse, aggregate, join
Limited
Limited
S3
Non-native, native lambda and slow
Yes (static)
Yes (static)
Yes, and ad hoc
sec-minutes
sec-minutes
sub-second
Batch-centric, low scale
Batch-centric, higher scale
Continuous, high scale
No administration
Easy to deploy and resize
Easy to deploy and resize
Single cluster only
Pooled resources
(no choice)
Choice of fixed size warehouses
Choice of EC2 instance types, engine sizes
Limited
Micro-partition storage, separate RAM
Optimized across storage and RAM (F3)
Non-native,
native funct.
and slow
Native storage, native lambda
and fast
Extensive
Internal optimized, and S3
Non-native, limited
and slow
Yes (static)
sec-minutes
Batch-centric, lower scale
Harder to manage
High: based on compute, storage, data transferred
Low: based on per TB scanned
High: based on compute, storage
Low: based only on size of data

Summary

The detailed comparison of Redshift, Athena, Snowflake, and Firebolt across architecture, scalability, performance, use cases and cost of ownership highlights the following major differences:

In short, some data warehouses are better for different use cases. Using this information will hopefully help you choose the right data warehouse or query engine for different use cases. It may also help you prepare your analytics infrastructure so that these choices can be made. 

There has never been such a thing as a single data warehouse that satisfied all the analytics needs of a company. The combination of today’s modern data pipelines and data lakes with the simplicity of cloud services have made adding another cloud warehouse relatively straightforward and cost effective. Those companies that were already using a cloud data warehouse for reporting and dashboards were able to add Firebolt as another cloud data warehouse in weeks, and use it for ad hoc, semi-structured data, operational and customer-facing analytics, while leaving the existing analytics in place.

If you do put a modern data pipeline in place that does make it easier to redirect your data from a data lake or other “single source of the truth” then you will be able to choose the best combination of price and performance for each analytics need.