Firebolt vs BigQuery

A detailed comparison

Compare Firebolt vs BigQuery by the following set of categories:

Firebolt Bigquery
Elasticity - separation of storage and compute Yes Yes
Supported cloud infrastructure AWS only Google Cloud only
Isolated tenancy - option for dedicated resources Multi-tenant dedicated resources Multi-tenant on demand and reserved resources only
Compute - node types 1-128 choice of any types No choice over (fixed) slot size
Data - internal/external, writable storage External tables (used for ingestion) External tables supported - 4 concurrent queries by default
Security - data Dedicated resources for storage and compute, Encryption at rest, RBAC Separate customer keys, column- level encryption, encryption at rest, AEAD individual value encryption, RBAC
Security - network Firewall and WAF, SSL, PrivateLink whitelist/ blacklist control, isolated tenancy TLS, Firewall (Google Cloud), TLS, VPN, whitelist/ blacklist control part of GCP

Firebolt is also a decoupled storage and compute architecture that adds storage and query optimizations for 10x better performance and increased efficiency. It also allows SQL to be run against external data formats to support ingestion. It also lets you choose any engine node type and number for each engine (cluster.) But it currently only runs on AWS.

BigQuery was one of the first first decoupled storage and compute architectures, released before Snowflake. It is a unique piece of engineering and not a typical data warehouse in part because it started as an on-demand serverless query engine. While its petabit network dramatically lowers network latency for data access for any given compute step, the additional network traffic caused by transferring and caching of data in shared memory over the network after each slot finishes its job instead of in local cache seems to eliminate any major advantage in actual benchmarks. If BigQuery does start to cache locally on slots, watch out Firebolt, you might have some closer competition.

Firebolt vs BigQuery - Architecture

The biggest difference among cloud data warehouses are whether they separate storage and compute, how much they isolate data and compute, and what clouds they can run on.

Firebolt Bigquery
Elasticity - (individual) query scalability 1-click cluster resize of node type, number of nodes Automatic allocation of each query to on demand, or reserved and flex slots
Elasticity - user (concurrent query) scalability Unlimited manual scaling Limited to 100 concurrent users by default*
Write scalability - batch Strong. Multi-master parallel batch 1,500 load jobs/day (~1 per minute), 100,000 per project, 15TB per job, 6 hour max time*
Write scalability - continuous Multi-master continuous writes 1GB/sec w/ no dedup, 100MB./sec w/ dedup, 100K rows per second per table, 100K-500K per project by default*
Data scalability No limit No real limit

Firebolt provides the same scalability benefits of a decoupled storage and compute architecture. It improves compute efficiency through its optimizations, and by allowing the choice of any sized node and number of nodes for each cluster. It also improves write scalability and supports continuous ingestion. Firebolt also improves network efficiency by only accessing the data ranges needed, not entire partitions.

BigQuery on demand has several official limitations* that are needed to protect everyone else using on demand from a rogue account or query. But you can easily get around any limitations by switching to reserved slots and requesting higher limits. BigQuery is in production at very large scale with several companies. Even limits with message-based ingestion are not an issue; BigQuery ingests into memory first and later commits to storage, which is a better architecture than Snowflake, Redshift, or Athena. Nevertheless, it is still more of a shared service than Snowflake or Redshift, which means it can theoretically hit shared limits.

Firebolt vs BigQuery - Scalability

There are three big differences among data warehouses and query engines that limit scalability: decoupled storage and compute, dedicated resources, and continuous ingestion.

Firebolt Bigquery
Indexes Indices for data access, joins, aggregation, search None.
Query optimization - performance Index- and cost-based optimization, vectorization JIT, pushdown optimization Cost-based optimization
Tuning Choice of node size, indexing, Optimized F3 storage (on S3). Data access integrated across storage and cache Can only purchase reserved or flex slots
Storage format Optimized F3 storage (on S3). Data access integrated across disk, SSD, RAM) Optimized (capacitor) on Colossus
Ingestion performance Multi-master, lock-free high performance ingestion with unlimited scale for batch or continuous ingestion Writes 1 row at a time. Limits of 100K messages/sec by default*
Ingestion latency Immediately visible during ingestion Immediately visible during ingestion
Partitioning F3 with sparse indexing Partitions, pruning
Caching F3 (cache) with aggregating and join indexes Result cache (24 hours), shared memory
Semi-structured data - native JSON functions within SQL Yes (Lambda) Yes
Semi-structured data - native JSON storage type Yes (Nested array type, compact) Can store as strings or STRUCT. But requires UDFs for compute
Semi-structured data - performance Fast (array operations without full scans) JSON strings slow. Can store as STRUCT and use UDF (JS)

Firebolt has clearly demonstrated storage and compute optimization, along with indexing, make a big difference in performance. Benchmarks by Firebolt, customers and prospects have demonstrated 4-6000x performance gains across a wide range of queries compared to any of the alternatives. This comes in part from more efficient storage access, where its F3 format and remote data access only fetches the data needed, not entire partitions. Query optimization, combined with extensive indexing also make a big difference as demonstrated through specific query examples of the impact of primary, aggregating and join indexes. Choice of any size and number of nodes for each engine helps as well. Firebolt also added native semi-structured data support and continuous, low latency ingestion.

BigQuery has not demonstrated significantly better performance or price-performance compared to Snowflake or Redshift. While remote storage access is much faster using the Jupiter petabit network, the constant writing to and fetching from shared memory over the network for each stage of the query execution (in the DAG) seems to eliminate that advantage. So does the fact that BigQuery does not use indexing. It means slots still have to process all the data stored in larger segments without filtering down to smaller (sorted) ranges. However, BigQuery does have lower latency for message-based ingestion since it does in fact ingest one row at a time and make it immediately available for querying.

Firebolt vs BigQuery - Performance

Performance is the biggest challenge with most data warehouses today.
While decoupled storage and compute architectures improved scalability and simplified administration, for most data warehouses it introduced two bottlenecks; storage, and compute. Most modern cloud data warehouses fetch entire partitions over the network instead of just fetching the specific data needed for each query. While many invest in caching, most do not invest heavily in query optimization. Most vendors also have not improved continuous ingestion or semi-structured data analytics performance, both of which are needed for operational and customer-facing use cases.

Firebolt Bigquery
Reporting Yes Yes
Dashboards Fixed view, dynamic / fast queries, changing data Fixed view
Ad hoc Sub-second to seconds first-time query performance Sec-min first-time query performance
Operational or customer-facing analytics (high concurrency, continuously updating / streaming data) Yes. Support continuous writes at scale, fast semi-structured data performance Slower query performance. Limited to 100K continuous writes/table, 100 concurrent users by default*
Data processing engine (Exports or publishes data) Export query results 1GB max export file size, exports to Google cloud only*
Data science/ML Export query results BigQuery ML

Firebolt offers many of the same benefits as Snowflake with its decoupled storage and compute, particularly isolation of workloads and support for high user concurrency. It is the only cloud data warehouse that has optimized compute and storage together for faster ingestion, network and query performance. Its F3 format enables sub-second network access. Indexing and query optimization enables sub-second query performance. It uniquely enables continuous ingestion at scale as well. This makes Firebolt not only well suited for reporting and dashboards, but also much better for interactive and ad hoc use cases, as well as operational and customer-facing use cases.

BigQuery, like Snowflake, has broader support for use cases beyond reporting and dashboards. You can isolate workloads by assigning each workload to different reserved slots. Unlike Snowflake, Redshift, or Athena, BigQuery also supports low latency streaming. But like these other three technologies. BigQuery also lacks the performance to support interactive or ad hoc queries at scale. This eliminates BigQuery from being a great option for many operational and customer-facing use cases where the users demand a few seconds of wait at worst, which translates to sub-second query times for the data warehouse.

Firebolt vs BigQuery - Use cases

There are a host of different analytics use cases that can be supported by a data warehouse. Look at your legacy technologies and their workloads, as well as the new possible use cases, and figure out which ones you will need to support in the next few years. They include:

Reporting where relatively static reports are created by analysts against historical data, and used by executives, managers, and now increasingly by employees and customers

Dashboards created by analysts against historical or live data, and used by executives, managers, and increasingly by employees and customers via Web-based applications

Interactive and ad hoc analytics within dashboards or other tools for on-the-fly interactive analysis either by expert analysts, or increasingly by employees and customers via self-service

High performance analytics that require very large or complex queries with sub-second performance.

Big data analytics using semi-structured or unstructured data and complex queries or functionality

Operational and customer-facing analytics built by development teams that deliver historical and live data and analytics to larger groups of employees and customers

Firebolt Bigquery
Administration - deployment, management Easy to deploy and resize. Easy to add indexing, change node types No administration or tuning
Choice - provision different cluster types on same data Choice of node types, engine sizes Yes
Choice - provision different number of nodes Yes Up to 2000 flex (on demand) slots, Purchase reserved or flex slots 100 at a time with no limits
Choice - provision different node types Yes No
Pricing - compute Choose any node. Compute costs range $1-10/hour with 10x price-performance advantage On demand - $5/TB data processed, Flex slots $4 for 100 slots per hour, $1700/month per 100 slots
Pricing - storage $23/TB $20/TB active storage, $10/TB inactive
Pricing - transferred data None Batch is free. Streaming ingest $0.01 per 200MB ($50/TB), streaming reads $1.1/TB

Firebolt is the only data warehouse with decoupled storage and compute that supports ad hoc and semi-structured data analytics with sub-second performance at scale. It also combines simplified administration with choice and control over node types and 10x or greater efficiency for the best price-performance. This makes it the best choice for ad hoc, high performance, operational and customer-facing analytics. 

BigQuery has three different pricing models: on demand, reserved, and flex pricing. If you need a data warehouse, you probably should not be using on demand unless you do not need to scan a lot of data for each query. You should be using reserved slots with flex slots to reduce the costs of workload variations. When you do, your costs will not be far off from Snowflake or Redshift for regular data warehouse workloads. BigQuery does give you the option to also support infrequent analytics, more inline with Athena. In other words, it is the best of both more traditional worlds. Nevertheless, BigQuery’s price-performance is inline with Snowflake and Redshift, which is up to 10x more expensive than Firebolt.

Firebolt vs BigQuery - Cost

This is perhaps the strangest, and yet the clearest comparison; cost. There are a lot of differences in the details, but at a high level, the main differences should be clear.

Compare other data warehouses

See all data warehouse comparisons ->

Talk to a Firebolt solution architect