BigQuery was one of the first decoupled storage and compute architectures. It is a unique piece of engineering and not a typical data warehouse in part because it started as an on-demand serverless query engine. It runs in multi-tenancy with shared resources, allocated as “slots” which represent a virtual CPU that executes SQL. BigQuery determines how many slots a query requires, without the ability of the user to control it.
Snowflake was one of the first decoupled storage and compute architectures, making it the first to have nearly unlimited compute scale and workload isolation, and horizontal user scalability. It runs on AWS, Azure and GCP. It is multi-tenant over shared resources in nature and requires you to move data out of your VPC and into the Snowflake cloud. “Virtual Private Snowflake” (VPS) is its highest-priced tier, and can run a dedicated isolated version of Snowflake. Its virtual warehouses can be T-shirt sized along an XS/S/M…/4XL axis, where each discrete T-shirt size is bundled with fixed HW properties that are abstracted from the users.
BigQuery scales very well to large data volumes, and automatically assigns more compute resources when needed behind the scenes, in the form of “slots”. BigQuery works either in an “on-demand pricing model”, where slot assignment is completely in the hands of BigQuery and the state of the shared resource pool, or in “flat-rate pricing model” where slots are reserved in advanced. With reserved slots there is more control over compute resources, thus making scaling more predictable. Concurrency is limited to 100 users by default.
Snowflake scales very well both for data volumes and query concurrency. The decoupled storage/compute architecture supports resizing clusters without downtime, and in addition, supports auto-scaling horizontally for higher query concurrency during peak hours.
BigQuery lines up in benchmarks in the same ballpark as other cloud data warehouses but does come in consistently last in most queries. Beyond implementing according to best practices, there is little you can do to accelerate BigQuery performance, as it determines the amount of resources (slots) the query needs for you. BigQuery can be used together with “BigQuery BI Engine” for lower latency analytics. However, BI Engine is limited in terms of scale because it runs in memory. Its maximum capacity is 100GB.
Snowflake typically comes on top for most queries when it comes to performance in public TPC-based benchmarks when compared to BigQuery and Redshift, but only marginally. Its micro partition storage approach effectively scans less data compared to larger partitions. The ability to isolate workloads over the decoupled storage & compute architecture lets you avoid competition for resources compared to multi-tenant shared resource solutions, and the ability to increase warehouse sizes can often enhance performance (for a higher price), but not always linearly. Snowflake’s recently released “Search optimization service” delivers index-like behavior for point queries, but comes at an additional cost.
BigQuery is a mature general-purpose data warehouse, which lends itself well to internal BI & reporting. The fact that it’s serverless in nature and tightly integrated with GCP, makes it very convenient for Ad-Hoc analytics and ML use cases on GCP. On the other hand, because BigQuery makes resource allocation decisions for you, it is not always the best fit for operational use cases and Data Apps where performance needs to be consistent and predictable.
Snowflake is a well rounded general purpose cloud data warehouse, that can also span beyond traditional BI & Analytics use cases into Ad-Hoc and ML use cases. Thanks to the flexible decoupeld storage & compute architecture that allows you to isolate and control the amount of compute per workload, it’s possible to tackle a broad spectrum of workloads. However, like its close siblings Redshift & BigQuery, it struggles to deliver low-latency query performance at scale, making it a lesser fit for operational use cases and customer-facing data apps.