BigQuery was one of the first first decoupled storage and compute architectures, released before Snowflake. It is a unique piece of engineering and not a typical data warehouse in part because it started as an on-demand serverless query engine. While its petabit network dramatically lowers network latency for data access for any given compute step, the additional network traffic caused by transferring and caching of data in shared memory over the network after each slot finishes its job instead of in local cache seems to eliminate any major advantage in actual benchmarks. If BigQuery does start to cache locally on slots, watch out Firebolt, you might have some closer competition.
Redshift has the oldest architecture with the best options. It does not separate storage and compute. While it now has RA3 nodes which allow you to scale compute and only cache the data you need locally, all compute still operates together. You cannot separate workloads. While you can only run Redshift as an isolated workload on AWS, it has the most options on AWS, including the ability to deploy it in your own VPC.
BigQuery on demand has several official limitations* that are needed to protect everyone else using on demand from a rogue account or query. But you can easily get around any limitations by switching to reserved slots and requesting higher limits. BigQuery is in production at very large scale with several companies. Even limits with message-based ingestion are not an issue; BigQuery ingests into memory first and later commits to storage, which is a better architecture than Snowflake, Redshift, or Athena. Nevertheless, it is still more of a shared service than Snowflake or Redshift, which means it can theoretically hit shared limits.
Redshift is limited in scale because even with RA3, it cannot distribute different workloads across clusters. While it can scale to up to 10 clusters automatically to support query concurrency, it can only handle a maximum of 50 queued queries across all clusters by default. In addition, because it locks at the table level, it is better suited for batch ingestion and limited in its write throughput.
BigQuery has not demonstrated significantly better performance or price-performance compared to Snowflake or Redshift. While remote storage access is much faster using the Jupiter petabit network, the constant writing to and fetching from shared memory over the network for each stage of the query execution (in the DAG) seems to eliminate that advantage. So does the fact that BigQuery does not use indexing. It means slots still have to process all the data stored in larger segments without filtering down to smaller (sorted) ranges. However, BigQuery does have lower latency for message-based ingestion since it does in fact ingest one row at a time and make it immediately available for querying.
Redshift does provide a result cache for accelerating repetitive query workloads and also has more tuning options than some others. But it does not deliver much faster compute performance than other cloud data warehouses in benchmarks. While its storage access is more efficient, with smaller data block sizes being fetched over the network, it does not perform a lot of query optimization, and has no support for indexes. It also has less support for semi-structured data or low-latency ingestion at any reasonable scale.
BigQuery, like Snowflake, has broader support for use cases beyond reporting and dashboards. You can isolate workloads by assigning each workload to different reserved slots. Unlike Snowflake, Redshift, or Athena, BigQuery also supports low latency streaming. But like these other three technologies. BigQuery also lacks the performance to support interactive or ad hoc queries at scale. This eliminates BigQuery from being a great option for many operational and customer-facing use cases where the users demand a few seconds of wait at worst, which translates to sub-second query times for the data warehouse.
Redshift was originally designed to support traditional internal BI reporting and dashboard use cases for analysts. Without second-level performance, it cannot support any interactive and ad hoc analytics. It also has a limit of 50 queued queries by default, which limits concurrency, and a lack of support for continuous ingestion. All of these limitations mean Redshift for operational and customer-facing use cases.
BigQuery has three different pricing models: on demand, reserved, and flex pricing. If you need a data warehouse, you probably should not be using on demand unless you do not need to scan a lot of data for each query. You should be using reserved slots with flex slots to reduce the costs of workload variations. When you do, your costs will not be far off from Snowflake or Redshift for regular data warehouse workloads. BigQuery does give you the option to also support infrequent analytics, more inline with Athena. In other words, it is the best of both more traditional worlds. Nevertheless, BigQuery’s price-performance is inline with Snowflake and Redshift, which is up to 10x more expensive than Firebolt.
Redshift, while it is arguably the most mature and feature-rich, is also the most like a traditional data warehouse in its limitations. This makes it the hardest to manage, and costly overall for traditional reporting and dashboards, and not as well suited for the newer use cases.