Snowflake vs. Redshift
February 1, 2021

Snowflake vs. Redshift (vs. Firebolt)

 A detailed comparison:

How should you compare Snowflake vs Redshift vs other cloud data warehouses like Firebolt?

The short answer is by the following set of categories:

  • Architecture: elasticity, technology support, isolation and security
  • Scalability: elastic scale, scale efficiency, query and user concurrency
  • Performance: query, indexing, data type and storage optimization
  • Use cases: reporting, dashboards, ad hoc, operations and customer-facing analytics
  • Cost: administration, infrastructure resources, vendor pricing

If you want a detailed explanation you can read the data warehouse comparison guide or watch a webinar on comparing data warehouses explaining the criteria and how to prioritize.

But here’s a summary and a more detailed explanation.

Snowflake vs. Redshift vs. Firebolt - Architecture

The biggest difference among cloud data warehouses are whether they separate storage and compute, how much they isolate data and compute, and what clouds they can run on.

Snowflake Amazon Redshift Firebolt
Elasticity - separation of storage and compute Yes Redshift RA3, Spectrum only Yes
Supported cloud infrastructure AWS, Azure, Google Cloud AWS only AWS only
Tenancy - option for dedicated resources Multi-tenant dedicated resources (Only VPS isolated) Isolated tenant, use own VPC Multi-tenant and isolated tenancy options for compute and storag
Compute - instance types 1-128 nodes, unknown types Any size, and types 1-128 choice of any types
Data - internal/external, writable storage External tables supported Internal writable, external (Spectrum) External tables (used for ingestion), supports writes
Security - data Separate customer keys (only VPS is isolated tenant,Encryption at rest, RBAC Isolated/VPC tenant for storage and compute, Encryption at rest, RBAC Multi-tenant with customer keys or isolated tenant for storage and compute. Encryption at rest RBAC
Security - network Firewall, SSL, PrivateLink whitelist/blacklist control, isolated for VPS only Firewall, SSL, PrivateLink whitelist/blacklist control, isolated/VPC tenancy Firewall and WAF, SSL, PrivateLink whitelist/blacklist control, isolated tenancy option

Redshift has the oldest architecture with the best options. It does not separate storage and compute. While it now has RA3 nodes which allow you to scale compute and only cache the data you need locally, all compute still operates together. You cannot separate workloads. While you can only run Redshift as an isolated workload on AWS, it has the most options on AWS, including the ability to deploy it in your own VPC.

Snowflake was one of the first decoupled storage and compute architectures, making it the first to have nearly unlimited compute scale and workload isolation, and horizontal user scalability. It runs on AWS, Azure and GCP, and while by default it is multi-tenant compute and data, it can run in a Snowflake VPC. But it only provides 1, 2, 4, … 128 node clusters with no choice of node sizes. To get the biggest nodes you need to choose the biggest cluster.

Firebolt is also a decoupled storage and compute architecture that adds storage and query optimizations for 10x better performance and increased efficiency. While it does have isolated tenancy like Snowflake, it currently only runs on AWS. It also allows SQL to be run against external data formats to support ingestion. It also lets you choose any engine node type and number for each engine (cluster.)

Snowflake vs. Redshift vs. Firebolt - Scalability

Snowflake Amazon Redshift Firebolt
Elasticity - (individual) query scalability 1-click cluster resize, no choice of node size Manual 1-click cluster resize of EC2 type, number of nodes
Elasticity - user (concurrent query) scalability Autoscale up to 10 warehouses. Limited to 20 DML writes in queue per table Autoscale up to 10 clusters, 15 queries per cluster, 50 queued queries total, max. Unlimited manual scaling
Write scalability - batch Strong 1 master cluster Strong. Multi-master parallel batch
Write scalability - continuous Limited to 20 DML writes in queue per table Limited (table-level locking) Unlimited continuous writes
Data scalability No specified storage limit. 4XL data warehouse (128 nodes). 16MB max field size Only with RA3: 128 RA3 nodes, 8PB of data. No limit

There are three big differences among data warehouses and query engines that limit scalability: decoupled storage and compute, dedicated resources, and continuous ingestion.

Redshift is limited in scale because even with RA3, it cannot distribute different workloads across clusters. While it can scale to up to 10 clusters automatically to support query concurrency, it can only handle a maximum of 50 queued queries across all clusters. In addition, because it locks at the table level, it is better suited for batch ingestion and limited in its write/upsert throughput.

Snowflake delivers strong scalability with its decoupled storage and compute architecture. But it is inefficient at scaling for certain queries that require larger nodes, because the only way to get larger nodes is with larger clusters. It is better suited for batch writes and upserts as well because it requires entire micro-partitions to be rewritten for each write. Snowflake also transfers entire micro-partitions over the network, which creates a bottleneck at scale.

Firebolt provides the same scalability benefits of a decoupled storage and compute architecture. It improves compute efficiency through its optimizations, and by allowing the choice of any sized node and number of nodes for each cluster. It also improves write scalability and supports continuous ingestion without requiring partition rewrites for each single write. Firebolt also improves network efficiency by only accessing the data ranges needed, not entire partitions.

Snowflake vs. Redshift vs. Firebolt - Performance

Performance is the biggest challenge with most data warehouses today. While decoupled storage and compute architectures improved scalability and simplified administration, for most data warehouses it introduced two bottlenecks; storage, and compute. Most modern cloud data warehouses fetch entire partitions over the network instead of just fetching the specific data needed for each query. While many invest in caching, most do not invest heavily in query optimization. Most vendors also have not improved continuous ingestion or semi-structured data analytics performance, both of which are needed for operational and customer-facing use cases.

Snowflake Amazon Redshift Firebolt
Indexes None. None. Indexes for data access, joins, aggregation
Query optimization - performance Cost-based optimization, vectorization Limited cost- based optimization Index- and cost-based optimization,vectorization JIT, pushdown optimization
Tuning - CPU, RAM, SSD, Storage Can only choose warehouse size not node types Choice of (limited) instance types Choice of any node size (CPU/RAM/SSD), tuning options
Storage format Optimized micro-partition storage (S3), separate RAM Native Redshift storage (not Spectrum) Optimized F3 storage (on S3). Data access integrated across disk, SSD, RAM)
Ingestion performance Batch-centric (micro-partition level locking, limit of 20 queued writes per table) Batch-centric (table-level locking) Multi-master, lock-free high performance ingestion with unlimited scale for batch or continuous ingestion
Ingestion latency Batch write preferred (1+ minute interval). Requires rewrite of entire micro-partition Batch-centric (table-level locking) Immediately visible during ingestion
Partitioning Micro-partition / pruning, cluster keys Distribution, sort keys Sparse indexing
Caching Result cache, materialized view Result cache F3 (cache) (Aggregate index, join index)
Semi-structured data - native JSON functions within SQL Yes Limited Yes (Lambda)
Semi-structured data - native JSON storage type Yes Yes Yes
Semi-structured data - performance Fixed view Fixed view Fixed view, dynamic, changing data

Redshift does provide a result cache for accelerating repetitive query workloads and also has more tuning options than some others. But it does not deliver much faster compute performance than other cloud data warehouses in benchmarks. While its storage access is more efficient, with smaller data block sizes being fetched over the network, it does not perform a lot of query optimization, and has no support for indexes. It also has no support for semi-structured data or low-latency ingestion at any reasonable scale.

Snowflake is a perfect example of a modern storage and compute architecture that is not optimized for performance. Its data access is not optimized. Snowflake has no indexing for fetching the exact data it needs. It only keeps track of data ranges within each micro-partition, which range in size from 50 MB to 150 MB uncompressed, and can overlap. Whenever Snowflake does not have the data cached locally in the virtual warehouse, it has to fetch all of the micro-partitions that might have the data, which can take seconds or longer. While Snowflake does some query plan optimization, it does not show up in the performance of queries, which are 4-6000x slower than Firebolt in customer benchmarks. The two biggest explanations are a lack of indexing, and limited query plan optimization. However, Snowflake does provide result set caching across virtual warehouses in addition to SSD caching within each virtual warehouse. This does deliver solid performance for repetitive query workloads after the first query.

Firebolt has clearly demonstrated storage and compute optimization, along with indexing, make a big difference in performance. Benchmarks by Firebolt, customers and prospects have demonstrated 4-6000x performance gains across a wide range of queries compared to any of the alternatives. This comes in part from more efficient storage access, where its F3 format and remote data access only fetches the data needed, not entire partitions. Query optimization, combined with extensive indexing also make a big difference as demonstrated through specific query examples of the impact of primary, aggregating and join indexes. Choice of any size and number of nodes for each engine helps as well. Firebolt also added native semi-structured data support and continuous, low latency ingestion.

Snowflake vs. Redshift vs. Firebolt - Use cases

There are a host of different analytics use cases that can be supported by a data warehouse. Look at your legacy technologies and their workloads, as well as the new possible use cases, and figure out which ones you will need to support in the next few years. They include:

Snowflake Amazon Redshift Firebolt
Reporting Yes Yes Yes
Dashboards Fixed view Fixed view Fixed view, dynamic, changing data
Ad hoc Sec-min first-time query performance Sec-min first-time query performance Sub-sec first-time query performance
Operational or customer-facing analytics (high concurrency data) Limited continuous writes and concurrency, slow semi- structured data performance Limited continuous writes and concurrency, limited semi- structured data support Yes. Support continuous writes at scale, fast semi-structured performance
Data processing engine (Exports or publishes data) Export query results or table Unload data as Parquet Export query results
Data science/ML Spark, Arrow, Python connectors, integration with ML tools, export query results Invoke ML (SageMaker) in SQL Export query results

  • Reporting where relatively static reports are created by analysts against historical data, and used by executives, managers, and now increasingly by employees and customers
  • Dashboards created by analysts against historical or live data, and used by executives, managers, and increasingly by employees and customers via Web-based applications
  • Interactive and ad hoc analytics within dashboards or other tools for on-the-fly interactive analysis either by expert analysts, or increasingly by employees and customers via self-service
  • High performance analytics that require very large or complex queries with sub-second performance 
  • Big data analytics using semi-structured or unstructured data and complex queries or functionality
  • Operational and customer-facing analytics built by development teams that deliver historical and live data and analytics to larger groups of employees and customers

Redshift was originally designed to support traditional internal BI reporting and dashboard use cases for analysts. Without second-level performance, it cannot support any interactive and ad hoc analytics. It also has a limit of 50 queued queries, which limits concurrency, and a lack of support for continuous ingestion. All of these limitations mean Redshift for operational and customer-facing use cases.

Snowflake has broader support for use cases beyond traditional reporting and dashboards. Its decoupled storage and compute architecture enables you to isolate different workloads to meet SLAs, and it also supports high user concurrency. But Snowflake also does not provide interactive or ad hoc query performance because of inefficient data access along with a lack of extensive indexing and query optimization. Snowflake also cannot support streaming or low latency ingestion below one minute ingestion intervals. All of these limitations exclude Snowflake from many operational use cases and most customer-facing applications that require second-level performance.

Firebolt offers many of the same benefits as Snowflake with its decoupled storage and compute, particularly isolation of workloads and support for high user concurrency. It is the only cloud data warehouse that has optimized compute and storage together for faster ingestion, network and query performance. Its F3 format enables sub-second network access. Indexing and query optimization enables sub-second query performance. It uniquely enables continuous ingestion at scale as well. This makes Firebolt not only well suited for reporting and dashboards, but also much better for interactive and ad hoc use cases, as well as operational and customer-facing use cases.

Snowflake vs. Redshift vs. Firebolt - Cost

This is perhaps the strangest, and yet the clearest comparison; cost.

In the FiveTrain benchmark comparing Redshift, Snowflake, and BigQuery on cost, they were all roughly the same cost for the same performance. This has to do with competition; over time vendors do tend to converge on price-performance. Presto, which is free, will cost half for the infrastructure, but will require your own resources to maintain it. Athena will cost you $5 per scanned TB, which is great for occasional queries, but not necessarily good for regular workloads or big data.

Snowflake Amazon Redshift Firebolt
Administration - deployment, management Easy to deploy and resize. Strong performance visibility, limited tuning Easy to provision, harder to configure and manage Easy to deploy and resize, Easy to add indexing, change instance types
Choice - provision different cluster types on same data Yes No Yes
Choice - provision different number of nodes Yes Yes Yes
Compute - instance types 1-128 nodes, unknown types Any size, and types 1-128 choice of any types
Choice - provision different instance types No Yes(Limited) Yes
Pricing - compute $2-$4+ per node. Fast analytics need large cluster ($16-32+/hour) or greater. $0.25-13 per node on demand, 40% less for up front Choose any node side and number. Compute costs deliver 10x or greater price-performance advantage.
Pricing - storage All storage.$23/40 per TB per month on demand/up front Stored data only. RA3: $24 per TB per month. S3: AWS S3 costs. $23/TB
Pricing - transferred data None Spectrum: $5 per TB scanned (10MB min per query) None

Firebolt is the only vendor that has focused on improving both performance and price-performance. Posted customer comparisons have demonstrated a 30x or greater price performance advantage. This comes from a combination of improved query efficiency, greater choice of the best cluster sizes, and price transparency.  

Main differences between Snowflake, Redshift and Firebolt

There are a lot of differences in the details, but at a high level, the main differences should be clear. 

  • Redshift, while it is arguably the most mature and feature-rich, is also the most like a traditional data warehouse in its limitations. This makes it the hardest to manage, and costly overall for traditional reporting and dashboards, and not as well suited for the newer use cases.
  • Snowflake as a more modern cloud data warehouse with decoupled storage and compute is easier to manage for reporting and dashboards, and delivers strong user scalability. It also runs on more than AWS. But like the others, Snowflake does not deliver sub-second performance for ad hoc, interactive analytics at any reasonable scale, or support continuous ingestion well. It is also often very expensive to scale, especially for large data sets, complex queries and semi-structured data.
  • Firebolt is the only data warehouse with decoupled storage and compute that supports ad hoc and semi-structured data analytics with sub-second performance at scale. It also combines simplified administration with choice and control over instance types and 10x or greater efficiency for the best price-performance. This makes it the best choice for ad hoc, high performance, operational and customer-facing analytics.

Get started with Firebolt

Free credits. Free support. No credit card needed.

Read all the posts

Intrigued? Want to read some more?