Snowflake was one of the first decoupled storage and compute architectures, making it the first to have nearly unlimited compute scale and workload isolation, and horizontal user scalability. It runs on AWS, Azure and GCP, and while by default it is multi-tenant compute and data, it can run in a Snowflake VPC. But it only provides 1, 2, 4, … 128 node clusters with no choice of node sizes. To get the biggest nodes you need to choose the biggest cluster.
Redshift has the oldest architecture with the best options. It does not separate storage and compute. While it now has RA3 nodes which allow you to scale compute and only cache the data you need locally, all compute still operates together. You cannot separate workloads. While you can only run Redshift as an isolated workload on AWS, it has the most options on AWS, including the ability to deploy it in your own VPC.
Snowflake delivers strong scalability with its decoupled storage and compute architecture. But it is inefficient at scaling for certain queries that require larger nodes, because the only way to get larger nodes is with larger clusters. It is better suited for batch writes and upserts as well because it requires entire micro-partitions to be rewritten for each write. Snowflake also transfers entire micro-partitions over the network, which creates a bottleneck at scale.
Redshift is limited in scale because even with RA3, it cannot distribute different workloads across clusters. While it can scale to up to 10 clusters automatically to support query concurrency, it can only handle a maximum of 50 queued queries across all clusters. In addition, because it locks at the table level, it is better suited for batch ingestion and limited in its write/upsert throughput.
Snowflake is a perfect example of a modern storage and compute architecture that is not optimized for performance. Its data access is not optimized. Snowflake has no indexing for fetching the exact data it needs. It only keeps track of data ranges within each micro-partition, which range in size from 50 MB to 150 MB uncompressed, and can overlap. Whenever Snowflake does not have the data cached locally in the virtual warehouse, it has to fetch all of the micro-partitions that might have the data, which can take seconds or longer. While Snowflake does some query plan optimization, it does not show up in the performance of queries, which are 4-6000x slower than Firebolt in customer benchmarks. The two biggest explanations are a lack of indexing, and limited query plan optimization. However, Snowflake does provide result set caching across virtual warehouses in addition to SSD caching within each virtual warehouse. This does deliver solid performance for repetitive query workloads after the first query.
Redshift does provide a result cache for accelerating repetitive query workloads and also has more tuning options than some others. But it does not deliver much faster compute performance than other cloud data warehouses in benchmarks. While its storage access is more efficient, with smaller data block sizes being fetched over the network, it does not perform a lot of query optimization, and has no support for indexes. It also has no support for semi-structured data or low-latency ingestion at any reasonable scale.
Snowflake has broader support for use cases beyond traditional reporting and dashboards. Its decoupled storage and compute architecture enables you to isolate different workloads to meet SLAs, and it also supports high user concurrency. But Snowflake also does not provide interactive or ad hoc query performance because of inefficient data access along with a lack of extensive indexing and query optimization. Snowflake also cannot support streaming or low latency ingestion below one minute ingestion intervals. All of these limitations exclude Snowflake from many operational use cases and most customer-facing applications that require second-level performance.
Redshift was originally designed to support traditional internal BI reporting and dashboard use cases for analysts. Without second-level performance, it cannot support any interactive and ad hoc analytics. It also has a limit of 50 queued queries, which limits concurrency, and a lack of support for continuous ingestion. All of these limitations mean Redshift for operational and customer-facing use cases.
Snowflake as a more modern cloud data warehouse with decoupled storage and compute is easier to manage for reporting and dashboards, and delivers strong user scalability. It also runs on more than AWS. But like the others, Snowflake does not deliver sub-second performance for ad hoc, interactive analytics at any reasonable scale, or support continuous ingestion well. It is also often very expensive to scale, especially for large data sets, complex queries and semi-structured data.
Redshift, while it is arguably the most mature and feature-rich, is also the most like a traditional data warehouse in its limitations. This makes it the hardest to manage, and costly overall for traditional reporting and dashboards, and not as well suited for the newer use cases.