Firebolt is also a decoupled storage and compute architecture that adds storage and query optimizations for 10x better performance and increased efficiency. While it does have isolated tenancy like Snowflake, it currently only runs on AWS. It also allows SQL to be run against external data formats to support ingestion. It also lets you choose any engine node type and number for each engine (cluster.) But it currently only runs on AWS.
Athena is built on a decoupled storage and compute architecture, though it only provides and controls the compute part and does not manage ingestion or storage. It is also only on multi-tenant shared resources. If you are a Redshift customer you can use Redshift Spectrum, which is dedicated Athena deployed on up to 10x the number of Redshift nodes in your own VPC, for the same price as Athena.
Firebolt provides the same scalability benefits of a decoupled storage and compute architecture. It improves compute efficiency through its optimizations, and by allowing the choice of any sized node and number of nodes for each cluster. It also improves write scalability and supports continuous ingestion. Firebolt also improves network efficiency by only accessing the data ranges needed, not entire partitions.
Athena is a shared multi-tenant resource, which means each account needs to be throttled to protect every other account’s performance. One customer was unable to handle any table or join above 5 billion rows. By default Athena supports a maximum of 20 concurrent users. If scalability is a top priority, Athena is probably the wrong choice.
Firebolt has clearly demonstrated storage and compute optimization, along with indexing, make a big difference in performance. Benchmarks by Firebolt, customers and prospects have demonstrated 4-6000x performance gains across a wide range of queries compared to any of the alternatives. This comes in part from more efficient storage access, where its F3 format and remote data access only fetches the data needed, not entire partitions. Query optimization, combined with extensive indexing also make a big difference as demonstrated through specific query examples of the impact of primary, aggregating and join indexes. Choice of any size and number of nodes for each engine helps as well. Firebolt also added native semi-structured data support and continuous, low latency ingestion.
Athena, and Presto, should be the worst at performance, by design. The reason is that it sacrifices storage-compute optimization to get support for federated queries across multiple data sources. But there is a reason Presto is so popular. Even with that handicap, Presto and Athena do very well. Presto can come close to Redshift and Snowflake in performance when both Presto and the external storage is managed by experts. But there is no support for indexing. Specifically with Athena, you cannot guarantee performance as a shared multi-tenant resource. In general, if performance is a top concern and you can bring data together via a data pipeline and optimize data with compute, then Athena or Presto are not the best choice.
Firebolt offers many of the same benefits as Snowflake with its decoupled storage and compute, particularly isolation of workloads and support for high user concurrency. It is the only cloud data warehouse that has optimized compute and storage together for faster ingestion, network and query performance. Its F3 format enables sub-second network access. Indexing and query optimization enables sub-second query performance. It uniquely enables continuous ingestion at scale as well. This makes Firebolt not only well suited for reporting and dashboards, but also much better for interactive and ad hoc use cases, as well as operational and customer-facing use cases.
Athena is one of the best “one-off” query engines; all you have to do is provide the data and pay $5 a TB. If you need to quickly pull together multiple data sources, it’s a great option. Redshift Spectrum is a great add-on option for Redshift for federated queries. But if you don’t need federated queries, need performance, and need anything other than one-off or occasional analytics, Athena is not a good option for any of these use cases. There is no data, network or query optimization, no indexing beyond pruning indexes like others.
Firebolt is the only data warehouse with decoupled storage and compute that supports ad hoc and semi-structured data analytics with sub-second performance at scale. It also combines simplified administration with choice and control over node types and 10x or greater efficiency for the best price-performance. This makes it the best choice for ad hoc, high performance, operational and customer-facing analytics.
Athena is arguably the easiest, least expensive and best suited for “one-off analytics”. But it is also the most limited, and requires you to manage your own (external) storage and ingestion very well, which is especially hard for continuous ingestion. This makes Athena the least-suited for any ongoing, frequent use case.