Send me as a pdf

Redshift vs Snowflake

A detailed comparison

Compare Redshift vs Snowflake by the following set of categories:


Redshift
has the oldest architecture, being the first Cloud DW in the group. Its architecture wasn’t designed to separate storage & compute. While it now has RA3 nodes which allow you to scale compute and only cache the data you need locally, all compute still operates together. You cannot separate and isolate different workloads over the same data, which puts it behind other decoupled storage/compute architectures. Redshift runs as an isolated tenant per customer, and unlike other cloud data warehouses, it is deployed in your VPC.

Snowflake was one of the first decoupled storage and compute architectures, making it the first to have nearly unlimited compute scale and workload isolation, and horizontal user scalability. It runs on AWS, Azure and GCP. It is multi-tenant over shared resources in nature and requires you to move data out of your VPC and into the Snowflake cloud. “Virtual Private Snowflake” (VPS) is its highest-priced tier, and can run a dedicated isolated version of Snowflake. Its virtual warehouses can be T-shirt sized along an XS/S/M…/4XL axis, where each discrete T-shirt size is bundled with fixed HW properties that are abstracted from the users.

Redshift vs Snowflake - Architecture

The biggest difference among cloud data warehouses are whether they separate storage and compute, how much they isolate data and compute, and what clouds they can run on.


Redshift
is limited in scale because even with RA3, it cannot distribute different workloads across clusters. While it can scale to up to 10 clusters automatically to support query concurrency, it can only handle a maximum of 50 queued queries across all clusters by default. 

Snowflake scales very well both for data volumes and query concurrency. The decoupled storage/compute architecture supports resizing clusters without downtime, and in addition, supports auto-scaling horizontally for higher query concurrency during peak hours.

Redshift vs Snowflake - Scalability

There are three big differences among data warehouses and query engines that limit scalability: decoupled storage and compute, dedicated resources, and continuous ingestion.


Redshift
does provide a result cache for accelerating repetitive query workloads and also has more tuning options than some others. But it does not deliver much faster compute performance than other cloud data warehouses in benchmarks. Sort keys can be used to optimize performance, but their contribution is limited. There is no support for indexes, and low-latency analytics at large data volumes is hard to achieve. Because Redshift decoupling of storage & compute is limited compared to other cloud data warehouses, it doesn’t support isolating workloads, which means performance can degrade under pressure and competition for resources.

Snowflake typically comes on top for most queries when it comes to performance in public TPC-based benchmarks when compared to BigQuery and Redshift, but only marginally. Its micro partition storage approach effectively scans less data compared to larger partitions. The ability to isolate workloads over the decoupled storage & compute architecture lets you avoid competition for resources compared to multi-tenant shared resource solutions, and the ability to increase warehouse sizes can often enhance performance (for a higher price), but not always linearly. Snowflake’s recently released “Search optimization service” delivers index-like behavior for point queries, but comes at an additional cost.

Redshift vs Snowflake - Performance

Performance is the biggest challenge with most data warehouses today.
While decoupled storage and compute architectures improved scalability and simplified administration, for most data warehouses it introduced two bottlenecks; storage, and compute. Most modern cloud data warehouses fetch entire partitions over the network instead of just fetching the specific data needed for each query. While many invest in caching, most do not invest heavily in query optimization. Most vendors also have not improved continuous ingestion or semi-structured data analytics performance, both of which are needed for operational and customer-facing use cases.


Redshift
was originally designed to support traditional internal BI reporting and dashboard use cases for analysts. As such, it is typically used as a general-purpose Enterprise data warehouse. With deep integrations into the AWS ecosystem, it can also leverage AWS ML service, making it also useful for ML projects. However, given the coupling of storage & compute, and the difficulty in delivering low-latency analytics at scale, it is less suited for operational use cases and customer-facing use cases like Data Apps. The coupling of storage and compute, together with the need to predefine sort & dist keys for optimal performance, make it challenging to use for Ad-Hoc analytics. 

Snowflake is a well rounded general purpose cloud data warehouse, that can also span beyond traditional BI & Analytics use cases into Ad-Hoc and ML use cases. Thanks to the flexible decoupeld storage & compute architecture that allows you to isolate and control the amount of compute per workload, it’s possible to tackle a broad spectrum of workloads. However, like its close siblings Redshift & BigQuery, it struggles to deliver low-latency query performance at scale, making it a lesser fit for operational use cases and customer-facing data apps.

Redshift vs Snowflake - Use cases

There are a host of different analytics use cases that can be supported by a data warehouse. Look at your legacy technologies and their workloads, as well as the new possible use cases, and figure out which ones you will need to support in the next few years.

Compare other data warehouses

See all data warehouse comparisons ->

Talk to a Firebolt solution architect