Firebolt vs Athena

A detailed comparison

Compare Firebolt vs Athena by the following set of categories:

Firebolt Amazon Athena
Elasticity - separation of storage and compute Yes Yes (query engine, no storage)
Supported cloud infrastructure AWS only AWS only
Isolated tenancy - option for dedicated resources Multi-tenant dedicated resources Multi-tenant pooled resources
Compute - node types 1-128 choice of any types N/A no choice
Data - internal/external, writable storage External tables (used for ingestion) External only
Security - data Dedicated resources for storage and compute Encryption at rest, RBAC Shared resources
Security - network Firewall and WAF, SSL, PrivateLink whitelist/blacklist control, isolated tenancy option Shared resources

is also a decoupled storage and compute architecture that adds storage and query optimizations for 10x better performance and increased efficiency. While it does have isolated tenancy like Snowflake, it currently only runs on AWS. It also allows SQL to be run against external data formats to support ingestion. It also lets you choose any engine node type and number for each engine (cluster.) But it currently only runs on AWS.

Athena is built on a decoupled storage and compute architecture, though it only provides and controls the compute part and does not manage ingestion or storage. It is also only on multi-tenant shared resources. If you are a Redshift customer you can use Redshift Spectrum, which is like dedicated Athena but not built on Presto deployed on up to 10x the number of Redshift nodes in your own VPC, for the same price as Athena.

Firebolt vs Athena - Architecture

The biggest difference among cloud data warehouses are whether they separate storage and compute, how much they isolate data and compute, and what clouds they can run on.

Firebolt Amazon Athena
Elasticity - (individual) query scalability 1-click cluster resize of node type, number of nodes Automatic (but shared resources)
Elasticity - user (concurrent query) scalability Unlimited manual scaling Limited - 20 concurrent queries by default
Write scalability - batch Strong. Multi-master parallel batch N/A
Write scalability - continuous Multi-master continuous writes N/A (mostly batch-centric storage)
Data scalability No limit Up to 100 partitions per table, 100 buckets (default)

Firebolt provides the same scalability benefits of a decoupled storage and compute architecture. It improves compute efficiency through its optimizations, and by allowing the choice of any sized node and number of nodes for each cluster. It also improves write scalability and supports continuous ingestion. Firebolt also improves network efficiency by only accessing the data ranges needed, not entire partitions.

Athena is a shared multi-tenant resource, which means each account needs to be throttled to protect every other account’s performance. One customer was unable to handle any table or join above 5 billion rows. By default Athena supports a maximum of 20 concurrent users. If scalability is a top priority, Athena is probably the wrong choice.

Firebolt vs Athena - Scalability

There are three big differences among data warehouses and query engines that limit scalability: decoupled storage and compute, dedicated resources, and continuous ingestion.

Firebolt Amazon Athena
Indexes Indices for data access, joins, aggregation, search None.
Query optimization - performance Index- and cost-based optimization,vectorization JIT, pushdown optimization Limited cost- based optimization
Tuning Choice of any node size, indexing, Optimized F3 storage (on S3). Data access integrated across storage and cache No choice of resources
Storage format Optimized F3 storage (on S3). Data access integrated across disk, SSD, RAM) S3
Ingestion performance Multi-master, lock-free high performance ingestion with unlimited scale for batch or continuous ingestion N/A (storage and ingestion separate)
Ingestion latency Immediately visible during ingestion Not well suited for low latency visibility since unable to see as new values during ingestion
Partitioning F3 with sparse indexing Partition pruning
Caching F3 (cache) with aggregating, join, and search indexes None
Semi-structured data - native JSON functions within SQL Yes (Lambda) Yes (Lambda)
Semi-structured data - native JSON storage type Yes (Nested array type, compact) No
Semi-structured data - performance Fast (array operations without full scans) Slow (full load into RAM, full scan)

Firebolt has clearly demonstrated storage and compute optimization, along with indexing, make a big difference in performance. Benchmarks by Firebolt, customers and prospects have demonstrated 4-6000x performance gains across a wide range of queries compared to any of the alternatives. This comes in part from more efficient storage access, where its F3 format and remote data access only fetches the data needed, not entire partitions. Query optimization, combined with extensive indexing also make a big difference as demonstrated through specific query examples of the impact of primary, aggregating and join indexes. Choice of any size and number of nodes for each engine helps as well. Firebolt also added native semi-structured data support and continuous, low latency ingestion.

Athena, and Presto, should be the worst at performance, by design. The reason is that it sacrifices storage-compute optimization to get support for federated queries across multiple data sources. But there is a reason Presto is so popular. Even with that handicap, Presto and Athena do very well. Presto can come close to Redshift and Snowflake in performance when both Presto and the external storage is managed by experts. But there is no support for indexing. Specifically with Athena, you cannot guarantee performance as a shared multi-tenant resource. In general, if performance is a top concern and you can bring data together via a data pipeline and optimize data with compute, then Athena or Presto are not the best choice.

Firebolt vs Athena - Performance

Performance is the biggest challenge with most data warehouses today.
While decoupled storage and compute architectures improved scalability and simplified administration, for most data warehouses it introduced two bottlenecks; storage, and compute. Most modern cloud data warehouses fetch entire partitions over the network instead of just fetching the specific data needed for each query. While many invest in caching, most do not invest heavily in query optimization. Most vendors also have not improved continuous ingestion or semi-structured data analytics performance, both of which are needed for operational and customer-facing use cases.

Firebolt Amazon Athena
Reporting Yes Yes
Dashboards Fixed view, dynamic / fast queries, changing data Fixed view
Ad hoc Sub-second to seconds first-time query performance Sec-min first-time query performance
Operational or customer-facing analytics (high concurrency, continuously updating / streaming data) Yes. Support continuous writes at scale, fast semi-structured data performance Slow query performanceand limited scale. Limited continuous writes and concurrency, slow semi- structured data performance
Data processing engine (Exports or publishes data) Export query results Exports query results
Data science/ML Export query results Export query results

Firebolt offers many of the same benefits as Snowflake with its decoupled storage and compute, particularly isolation of workloads and support for high user concurrency. It is the only cloud data warehouse that has optimized compute and storage together for faster ingestion, network and query performance. Its F3 format enables sub-second network access. Indexing and query optimization enables sub-second query performance. It uniquely enables continuous ingestion at scale as well. This makes Firebolt not only well suited for reporting and dashboards, but also much better for interactive and ad hoc use cases, as well as operational and customer-facing use cases.

Athena is one of the best “one-off” query engines; all you have to do is provide the data and pay $5 a TB. If you need to quickly pull together multiple data sources, it’s a great option. Redshift Spectrum is a great add-on option for Redshift for federated queries. But if you don’t need federated queries, need performance, and need anything other than one-off or occasional analytics, Athena is not a good option for any of these use cases. There is no data, network or query optimization, no indexing beyond pruning indexes like others.

Firebolt vs Athena - Use cases

There are a host of different analytics use cases that can be supported by a data warehouse. Look at your legacy technologies and their workloads, as well as the new possible use cases, and figure out which ones you will need to support in the next few years. They include:

Reporting where relatively static reports are created by analysts against historical data, and used by executives, managers, and now increasingly by employees and customers

Dashboards created by analysts against historical or live data, and used by executives, managers, and increasingly by employees and customers via Web-based applications

Interactive and ad hoc analytics within dashboards or other tools for on-the-fly interactive analysis either by expert analysts, or increasingly by employees and customers via self-service

High performance analytics that require very large or complex queries with sub-second performance.

Big data analytics using semi-structured or unstructured data and complex queries or functionality

Operational and customer-facing analytics built by development teams that deliver historical and live data and analytics to larger groups of employees and customers

Firebolt Amazon Athena
Administration - deployment, management Easy to deploy and resize, Easy to add indexing, change node types No administration or tuning
Choice - provision different cluster types on same data Choice of node types, engine sizes No
Choice - provision different number of nodes Yes No
Choice - provision different node types Yes No
Pricing - compute Choose any node side and number. Compute costs deliver 10x or greater price-performance advantage. None
Pricing - storage $23/TB N/A (not part of Athena)
Pricing - transferred data None $5 per TB scanned (10MB per query)

Firebolt is the only data warehouse with decoupled storage and compute that supports ad hoc and semi-structured data analytics with sub-second performance at scale. It also combines simplified administration with choice and control over node types and 10x or greater efficiency for the best price-performance. This makes it the best choice for ad hoc, high performance, operational and customer-facing analytics.

Athena is arguably the easiest, least expensive and best suited for “one-off analytics”. But it is also the most limited, and requires you to manage your own (external) storage and ingestion very well, which is especially hard for continuous ingestion. This makes Athena the least-suited for any ongoing, frequent use case.

Firebolt vs Athena - Cost

This is perhaps the strangest, and yet the clearest comparison; cost. There are a lot of differences in the details, but at a high level, the main differences should be clear.

Compare other data warehouses

See all data warehouse comparisons ->

Talk to a Firebolt solution architect