Snowflake was one of the first decoupled storage and compute architectures, making it the first to have nearly unlimited compute scale and workload isolation, and horizontal user scalability. It runs on AWS, Azure and GCP. It is multi-tenant over shared resources in nature and requires you to move data out of your VPC and into the Snowflake cloud. “Virtual Private Snowflake” (VPS) is its highest-priced tier, and can run a dedicated isolated version of Snowflake. Its virtual warehouses can be T-shirt sized along an XS/S/M…/4XL axis, where each discrete T-shirt size is bundled with fixed HW properties that are abstracted from the users.
BigQuery was one of the first decoupled storage and compute architectures. It is a unique piece of engineering and not a typical data warehouse in part because it started as an on-demand serverless query engine. It runs in multi-tenancy with shared resources, allocated as “slots” which represent a virtual CPU that executes SQL. BigQuery determines how many slots a query requires, without the ability of the user to control it.
Snowflake scales very well both for data volumes and query concurrency. The decoupled storage/compute architecture supports resizing clusters without downtime, and in addition, supports auto-scaling horizontally for higher query concurrency during peak hours.
BigQuery scales very well to large data volumes, and automatically assigns more compute resources when needed behind the scenes, in the form of “slots”. BigQuery works either in an “on-demand pricing model”, where slot assignment is completely in the hands of BigQuery and the state of the shared resource pool, or in “flat-rate pricing model” where slots are reserved in advanced. With reserved slots there is more control over compute resources, thus making scaling more predictable. Concurrency is limited to 100 users by default.
Snowflake typically comes on top for most queries when it comes to performance in public TPC-based benchmarks when compared to BigQuery and Redshift, but only marginally. Its micro partition storage approach effectively scans less data compared to larger partitions. The ability to isolate workloads over the decoupled storage & compute architecture lets you avoid competition for resources compared to multi-tenant shared resource solutions, and the ability to increase warehouse sizes can often enhance performance (for a higher price), but not always linearly. Snowflake’s recently released “Search optimization service” delivers index-like behavior for point queries, but comes at an additional cost.
BigQuery lines up in benchmarks in the same ballpark as other cloud data warehouses but does come in consistently last in most queries. Beyond implementing according to best practices, there is little you can do to accelerate BigQuery performance, as it determines the amount of resources (slots) the query needs for you. BigQuery can be used together with “BigQuery BI Engine” for lower latency analytics. However, BI Engine is limited in terms of scale because it runs in memory. Its maximum capacity is 100GB.
Snowflake has broader support for use cases beyond traditional reporting and dashboards. Its decoupled storage and compute architecture enables you to isolate different workloads to meet SLAs, and it also supports high user concurrency. But Snowflake also does not provide interactive or ad hoc query performance because of inefficient data access along with a lack of extensive indexing and query optimization. Snowflake also cannot support streaming or low latency ingestion below one minute ingestion intervals. All of these limitations exclude Snowflake from many operational use cases and most customer-facing applications that require second-level performance.
BigQuery, like Snowflake, has broader support for use cases beyond reporting and dashboards. You can isolate workloads by assigning each workload to different reserved slots. Unlike Snowflake, Redshift, or Athena, BigQuery also supports low latency streaming. But like these other three technologies. BigQuery also lacks the performance to support interactive or ad hoc queries at scale. This eliminates BigQuery from being a great option for many operational and customer-facing use cases where the users demand a few seconds of wait at worst, which translates to sub-second query times for the data warehouse.