BigQuery was one of the first decoupled storage and compute architectures. It is a unique piece of engineering and not a typical data warehouse in part because it started as an on-demand serverless query engine. It runs in multi-tenancy with shared resources, allocated as “slots” which represent a virtual CPU that executes SQL. BigQuery determines how many slots a query requires, without the ability of the user to control it. BigQuery can be priced on a $/TB scanned basis or through slot reservations. A slot in BigQuery is logically equivalent to 0.5 vCPU and 0.5GB of RAM. There are multiple models to allocate slots in BigQuery.
Firebolt is built on a natively decoupled storage & compute architecture, on AWS only. Data has to be copied outside of your VPC into the Firebolt, where both your compute and data run in a dedicated and isolated tenant. A “Firebolt Engine” can be granularly configured across # of nodes and different CPU/RAM/SSD combinations.
BigQuery scales very well to large data volumes, and automatically assigns more compute resources when needed behind the scenes, in the form of “slots”. BigQuery works either in an “on-demand pricing model”, where slot assignment is completely in the hands of BigQuery and the state of the shared resource pool, or in “flat-rate pricing model” where slots are reserved in advanced. With reserved slots there is more control over compute resources, thus making scaling more predictable. Concurrency is limited to 100 users by default.
Firebolt can handle the largest data volumes and concurrency on a single comparable cluster size, thanks to its superior hardware efficiency. Thanks to its decoupled storage & compute architecture it scales very well to large data volumes. However, resizing an engine size isn’t instant and requires orchestration if avoiding downtime is necessary. A single Firebolt engine can support hundreds of concurrent queries, avoiding the need to scale out for most use cases. Scaling horizontally for even higher concurrency is manual.
BigQuery lines up in benchmarks in the same ballpark as other cloud data warehouses but does come in consistently last in most queries. Beyond implementing according to best practices, there is little you can do to accelerate BigQuery performance, as it determines the amount of resources (slots) the query needs for you. BigQuery can be used together with “BigQuery BI Engine” for lower latency analytics. However, BI Engine is limited in terms of scale because it runs in memory. Its maximum capacity is 100GB.
Firebolt is the fastest when it comes to query performance when compared to cloud data warehouses and services like Athena. Its unique approach to storage and indexing results in highly aggressive data pruning that scans dramatically less data compared to other technologies. While other technologies scan partitions or micro-partitions, Firebolt works with indexed data ranges, that are significantly smaller. In addition, Firebolt lets user accelerate queries further with multiple index types (Aggregating index, Join index), and using its decoupled storage & compute architecture workloads can be easily isolated to guarantee consistent performance.
BigQuery is a mature general-purpose data warehouse, which lends itself well to internal BI & reporting. The fact that it’s serverless in nature and tightly integrated with GCP, makes it very convenient for Ad-Hoc analytics and ML use cases on GCP. On the other hand, because BigQuery makes resource allocation decisions for you, it is not always the best fit for operational use cases and Data Apps where performance needs to be consistent and predictable.
Firebolt stands out by being the fastest cloud data warehouse when compared to Snowflake, Redshift, BigQuery and Athena. It’s great for delivering sub-second analytics at scale, while remaining hardware efficient and high concurrency friendly. This makes it a great choice for operational use cases and customer-facing data apps. Given that it is not as feature-rich and integration rich as the more mature data warehouses makes it a lesser fit for a general-purpose Enterprise data warehouse. It is also not the best fit for ad-hoc use cases, because of the need to predefine indexing at the table level.