Snowflake vs BigQuery

A detailed comparison

Compare Snowflake vs BigQuery by the following set of categories:

Snowflake Bigquery
Elasticity - separation of storage and compute Yes Yes
Supported cloud infrastructure AWS, Azure, Google Cloud Google Cloud only
Isolated tenancy - option for dedicated resources Multi-tenant resources (Only VPS dedicated on isolated Snowflake account) Multi-tenant on demand and reserved resources only
Compute - node types 1-128 nodes, unknown types No choice over (fixed) slot size
Data - internal/external, writable storage External tables supported External tables supported - 4 concurrent queries by default
Security - data Separate customer keys (only VPS is isolated tenant) Encryption at rest, RBAC Separate customer keys, column- level encryption, encryption at rest, AEAD individual value encryption, RBAC
Security - network Firewall, SSL, PrivateLink whitelist/ blacklist control, isolated for VPS only TLS, Firewall (Google Cloud), TLS, VPN, whitelist/ blacklist control part of GCP

Snowflake was one of the first decoupled storage and compute architectures, making it the first to have nearly unlimited compute scale and workload isolation, and horizontal user scalability. It runs on AWS, Azure and GCP, and while by default it is multi-tenant compute and data, it can run in a Snowflake VPC. But it only provides 1, 2, 4, … 128 node clusters with no choice of node sizes. To get the biggest nodes you need to choose the biggest cluster.

BigQuery was one of the first first decoupled storage and compute architectures, released before Snowflake. It is a unique piece of engineering and not a typical data warehouse in part because it started as an on-demand serverless query engine. While its petabit network dramatically lowers network latency for data access for any given compute step, the additional network traffic caused by transferring and caching of data in shared memory over the network after each slot finishes its job instead of in local cache seems to eliminate any major advantage in actual benchmarks. If BigQuery does start to cache locally on slots, watch out Firebolt, you might have some closer competition.

Snowflake vs BigQuery - Architecture

The biggest difference among cloud data warehouses are whether they separate storage and compute, how much they isolate data and compute, and what clouds they can run on.

Snowflake Bigquery
Elasticity - (individual) query scalability 1-click cluster resize, no choice of node size Automatic allocation of each query to on demand, or reserved and flex slots
Elasticity - user (concurrent query) scalability Autoscale up to 10 warehouses. Limited to 20 DML writes in queue per table Limited to 100 concurrent users by default*
Write scalability - batch Strong 1,500 load jobs/day (~1 per minute), 100,000 per project, 15TB per job, 6 hour max time*
Write scalability - continuous Limited to 20 DML writes in queue per table. 1 minute or greater ingestion latency recommended 1GB/sec w/ no dedup, 100MB./sec w/ dedup, 100K rows per second per table, 100K-500K per project by default*
Data scalability No specified storage limit. 4XL data warehouse (128 nodes) No real limit

Snowflake delivers strong scalability with its decoupled storage and compute architecture. But it is inefficient at scaling for certain queries that require larger nodes, because the only way to get larger nodes is with larger clusters. It is better suited for batch writes as well because it requires entire micro-partitions to be rewritten for each write. Snowflake also transfers entire micro-partitions over the network, which creates a bottleneck at scale.

BigQuery on demand has several official limitations* that are needed to protect everyone else using on demand from a rogue account or query. But you can easily get around any limitations by switching to reserved slots and requesting higher limits. BigQuery is in production at very large scale with several companies. Even limits with message-based ingestion are not an issue; BigQuery ingests into memory first and later commits to storage, which is a better architecture than Snowflake, Redshift, or Athena. Nevertheless, it is still more of a shared service than Snowflake or Redshift, which means it can theoretically hit shared limits.

Snowflake vs BigQuery - Scalability

There are three big differences among data warehouses and query engines that limit scalability: decoupled storage and compute, dedicated resources, and continuous ingestion.

Snowflake Bigquery
Indexes None. None.
Query optimization - performance Cost-based optimization, vectorization Cost-based optimization
Tuning - CPU, RAM, SSD, Storage Can only choose warehouse size not node types Can only purchase reserved or flex slots
Storage format Optimized micro-partition storage (S3), separate RAM Optimized (capacitor) on Colossus
Ingestion performance Batch-centric (micro-partition level locking, limit of 20 queued writes per table) Writes 1 row at a time. Limits of 100K messages/sec by default*
Ingestion latency Batch write preferred (1+ minute interval). Requires rewrite of entire micro-partition Immediately visible during ingestion.
Partitioning Micro-partition / pruning, cluster keys Partitions, pruning
Caching Result cache, materialized view Result cache (24 hours), shared memory
Semi-structured data - native JSON functions within SQL Yes Yes
Semi-structured data - native JSON storage type Limited VARIANT (single field) Can store as strings or STRUCT. But requires UDFs for compute
Semi-structured data - performance Slow (can require full load into RAM, full scan) JSON strings slow. Can store as STRUCT and use UDF (JS)

Snowflake has of a modern storage and compute architecture that is not completely optimized for performance. Its data access is not optimized. Snowflake has no indexing for fetching the exact data it needs. It only keeps track of data ranges within each micro-partition, which range in size from 50 MB to 150 MB uncompressed, and can overlap. Whenever Snowflake does not have the data cached locally in the virtual warehouse, it has to fetch all of the micro-partitions that might have the data, which can take seconds or longer. While Snowflake does some query plan optimization, it does not show up in the performance of queries, which are 4-6000x slower than Firebolt in customer benchmarks. The three biggest reasons are inefficient data access, a lack of indexing, and less query plan optimization. However, Snowflake does provide result set caching across virtual warehouses in addition to SSD caching within each virtual warehouse. This does deliver solid performance for repetitive query workloads after the first query. 

BigQuery has not demonstrated significantly better performance or price-performance compared to Snowflake or Redshift. While remote storage access is much faster using the Jupiter petabit network, the constant writing to and fetching from shared memory over the network for each stage of the query execution (in the DAG) seems to eliminate that advantage. So does the fact that BigQuery does not use indexing. It means slots still have to process all the data stored in larger segments without filtering down to smaller (sorted) ranges. However, BigQuery does have lower latency for message-based ingestion since it does in fact ingest one row at a time and make it immediately available for querying.

Snowflake vs BigQuery - Performance

Performance is the biggest challenge with most data warehouses today.
While decoupled storage and compute architectures improved scalability and simplified administration, for most data warehouses it introduced two bottlenecks; storage, and compute. Most modern cloud data warehouses fetch entire partitions over the network instead of just fetching the specific data needed for each query. While many invest in caching, most do not invest heavily in query optimization. Most vendors also have not improved continuous ingestion or semi-structured data analytics performance, both of which are needed for operational and customer-facing use cases.

Snowflake Bigquery
Reporting Yes Yes
Dashboards Fixed view Fixed view
Ad hoc Sec-min first-time query performance Sec-min first-time query performance
Operational or customer-facing analytics (high concurrency, continuously updating / streaming data) Slower query performance. Limited continuous writes and concurrency, slow semi- structured data performance Slower query performance. Limited to 100K continuous writes/table, 100 concurrent users by default*
Data processing engine (Exports or publishes data) Export query results or table 1GB max export file size, exports to Google cloud only*
Data science/ML Spark, Arrow, Python connectors, integration with ML tools, export query results BigQuery ML

Snowflake has broader support for use cases beyond traditional reporting and dashboards. Its decoupled storage and compute architecture enables you to isolate different workloads to meet SLAs, and it also supports high user concurrency. But Snowflake also does not provide interactive or ad hoc query performance because of inefficient data access along with a lack of extensive indexing and query optimization. Snowflake also cannot support streaming or low latency ingestion below one minute ingestion intervals. All of these limitations exclude Snowflake from many operational use cases and most customer-facing applications that require second-level performance.

BigQuery, like Snowflake, has broader support for use cases beyond reporting and dashboards. You can isolate workloads by assigning each workload to different reserved slots. Unlike Snowflake, Redshift, or Athena, BigQuery also supports low latency streaming. But like these other three technologies. BigQuery also lacks the performance to support interactive or ad hoc queries at scale. This eliminates BigQuery from being a great option for many operational and customer-facing use cases where the users demand a few seconds of wait at worst, which translates to sub-second query times for the data warehouse.

Snowflake vs BigQuery - Use cases

There are a host of different analytics use cases that can be supported by a data warehouse. Look at your legacy technologies and their workloads, as well as the new possible use cases, and figure out which ones you will need to support in the next few years. They include:

Reporting where relatively static reports are created by analysts against historical data, and used by executives, managers, and now increasingly by employees and customers

Dashboards created by analysts against historical or live data, and used by executives, managers, and increasingly by employees and customers via Web-based applications

Interactive and ad hoc analytics within dashboards or other tools for on-the-fly interactive analysis either by expert analysts, or increasingly by employees and customers via self-service

High performance analytics that require very large or complex queries with sub-second performance.

Big data analytics using semi-structured or unstructured data and complex queries or functionality

Operational and customer-facing analytics built by development teams that deliver historical and live data and analytics to larger groups of employees and customers

Snowflake Bigquery
Administration - deployment, management Easy to deploy and resize. Strong performance visibility, limited tuning No administration or tuning
Choice - provision different cluster types on same data Choice of fixed size warehouses Yes
Choice - provision different number of nodes Yes Up to 2000 flex (on demand) slots, Purchase reserved or flex slots 100 at a time with no limits
Choice - provision different node types No No
Pricing - compute $2-$4+ per node. Fast analytics need large cluster ($16-32+/hour) or greater On demand - $5/TB data processed, Flex slots $4 for 100 slots per hour, $1700/month per 100 slots
Pricing - storage All storage.$23/40 per TB per month on demand/up front $20/TB active storage, $10/TB inactive
Pricing - transferred data None Batch is free. Streaming ingest $0.01 per 200MB ($50/TB), streaming reads $1.1/TB

Snowflake as a more modern cloud data warehouse with decoupled storage and compute is easier to manage for reporting and dashboards, and delivers strong user scalability. It also runs on more than AWS. But like the others, Snowflake does not deliver sub-second performance for ad hoc, interactive analytics at any reasonable scale, or support continuous ingestion well. It is also often very expensive to scale, especially for large data sets, complex queries and semi-structured data.

BigQuery has three different pricing models: on demand, reserved, and flex pricing. If you need a data warehouse, you probably should not be using on demand unless you do not need to scan a lot of data for each query. You should be using reserved slots with flex slots to reduce the costs of workload variations. When you do, your costs will not be far off from Snowflake or Redshift for regular data warehouse workloads. BigQuery does give you the option to also support infrequent analytics, more inline with Athena. In other words, it is the best of both more traditional worlds. Nevertheless, BigQuery’s price-performance is inline with Snowflake and Redshift, which is up to 10x more expensive than Firebolt.

Snowflake vs BigQuery - Cost

This is perhaps the strangest, and yet the clearest comparison; cost. There are a lot of differences in the details, but at a high level, the main differences should be clear.

Compare other data warehouses

See all data warehouse comparisons ->

Want to hear more about Firebolt? Set a meeting with our solution architects: