WHITEPAPER

Data Warehouse Product Comparison: Snowflake vs Firebolt

Table of contents

Introduction

Where there is data there is analytics. Over the last decade, as cloud computing has taken hold, so have cloud data warehouses. Nearly a decade ago, Snowflake released one of the first modern cloud data warehouses to enter the market, with elastic scalability built on the separation of storage and compute. Since then, Snowflake has become one of the leading cloud data warehouses in market share along with RedShift, BigQuery, and Azure Synapse.

But as cloud data warehouse usage has grown, so have the challenges. Some challenges were the same ones on-premises data warehouses had for decades, such as the lack of query performance needed for ad hoc analytics, or the ever-increasing volumes of batch data. Other challenges are newer, such as the need to support ever-increasing volumes of streaming and semi-structured data, or hundreds to thousands of concurrent users.

In short, what companies need out of a cloud data warehouse has changed. Snowflake is great for moving historical reporting and business intelligence (BI) workloads to the cloud. But it is not well suited for ad hoc analytics. Even at its fastest, Snowflake is still slower than several of the on-premises data warehouses it has replaced. It is also not as well suited for semi-structured or streaming data, or for operational or customer-facing analytics. Snowflake can also get very expensive, very fast. You may already have heard stories about “credit fever”, or single one-off queries that blew the credit budget.

There have been several technical innovations added to cloud data warehouses to help deliver more performance and scalability out of existing infrastructure, and lower costs. Some came straight from legacy data warehouses. Others are completely new, similar to the types of innovations that happened with Hadoop for batch computing. 

Firebolt is different. It has added many of these new innovations to improve performance, scale and efficiency. By using Firebolt, companies have achieved sub-second performance at petabyte scale, and 4-6000x faster performance than Snowflake across different queries in their benchmarks.

This whitepaper provides a detailed comparison between Firebolt and Snowflake across more than 30 different categories including their overall architectures and core features, scalability, performance, and cost, as well as their suitability across different analytics use cases. It begins with the summary comparison as a table, and then proceeds to explain the differences in each major category in more detail.

Comparison summary: Snowflake vs Firebolt

Snowflake was one of the first modern cloud data warehouses to enter the market nearly a decade ago. Since then, it has become one of the leading cloud data warehouses in market share. It was a 2nd-generation data warehouse that combined the separation of storage and compute for elastic scalability with the simplicity of SaaS and eliminating tuning and other administrative tasks. Originally released in 2012 on AWS as a shared multi-tenant service, Snowflake can now be deployed on AWS, Azure (2018) and Google Cloud (2020) and also as a Virtual Private Snowflake in its own isolated tenant per customer.

Firebolt is much newer to market. After several years of development, Firebolt came out of stealth in 2020 as a cloud-native service on AWS. It is very similar to Snowflake in that it is built on decoupled storage and compute. But it is also the first cloud data warehouse to focus on improving performance, scalability and cost for newer analytics workloads including high performance, interactive ad hoc, semi-structured data, and operational and customer-facing analytics. By combining more recent innovations in storage, indexing, query optimization and continuous ingestion, Firebolt has been able to deliver an order of magnitude improvement in performance, with sub-second performance from gigabyte to petabyte scale. It has also cut costs an order of magnitude by improving the efficiency of compute and storage, and by allowing customers to choose the rise size and number of AWS instance types that are best for each workload.

This whitepaper starts with the conclusion; a detailed comparison of Firebolt and Snowflake in a single table across more than 30 categories, including:

What follows the table is a more detailed explanation for each major category.

Detailed table: Snowflake vs Firebolt

ARCHITECTURE (overall)
Scalability
3rd generation
2nd generation
Elasticity - separation of storage and compute
Supported cloud infrastructure
Tenancy - option for dedicated resources
Compute - instance types
Performance
Data - internal/external, writable storage
Security - data
Security - network
Elasticity - (individual) query scalability
Use cases
Cost
Elasticity - user (concurrent query) scalability
Write scalability - batch
Write scalability - continuous
Data scalability
Query optimization - performance
Tuning - CPU, RAM, SSD, Storage
Ingestion performance
Indexes
Storage format
Ingestion latency
Partitioning
Caching
Semi-structured data - native JSON functions within SQL
Semi-structured data - native JSON storage type
Semi-structured data - performance
Reporting
Dashboards
Ad hoc
Operational or customer-facing analytics (high concurrency, continuous writes / streaming data)
Data processing engine (Exports or publishes data)
Administration - deployment, management
Choice - provision different cluster types on same data
Choice - provision different number of nodes
Choice - provision different instance types
Pricing - compute
Pricing - storage
Pricing - transferred data
Data science/ML
AWS, Azure, Google Cloud
AWS only
Yes
Yes
1-128 nodes, unknown types
1-128 choice of any types
Multi-tenant dedicated resources (Only VPS isolated)
Multi-tenant and isolated tenancy options for compute and storage
External tables supported
External tables (used for ingestion)
Autoscale up to 10 warehouses. Limited to 20 DML writes in queue per table
Unlimited manual scaling
No specified storage limit. 4XL data warehouse (128 nodes). 16MB max field size
No limit
Optimized micro-partition storage (S3), separate RAM
Optimized F3 storage (on S3). Data access integrated across disk, SSD, RAM)
Micro-partition / pruning, cluster keys
Sparse indexing
Separate customer keys (only VPS is isolated tenant) Encryption at rest, RBAC
Firewall, SSL, PrivateLink whitelist/blacklist control, isolated for VPS only
Firewall and WAF, SSL, PrivateLink whitelist/blacklist control, isolated tenancy option
Multi-tenant with customer keys or isolated tenant for storage and compute.
Encryption at rest RBAC
1-click cluster resize, no choice of node size
1-click cluster resize of EC2 type, number of nodes
Limited to 20 DML writes in queue per table
Unlimited continuous ingestion
Strong
Strong. Multi-master parallel batch
None
None
All storage.
$23/40 per TB per month on demand/up front
$23/TB
$2-$4+ per node. Fast analytics need large cluster ($16-32+/hour) or greater.

Choose any node side and number. Compute costs deliver 10x or greater price-performance advantage
No
Yes
Yes
Yes
Yes
Yes
Spark, Arrow, Python connectors, integration with ML tools, export query results
Export query results
Easy to deploy and resize. Strong performance visibility, limited tuning
Easy to deploy and resize
Easy to add indexing, change instance types
Export query results or table
Export query results
Limited continuous writes and concurrency, slow semi- structured data performance
Yes. Support continuous ingestion at scale, fast semi-structured performance
Sec-min first-time query performance
Sub-sec first-time query performance
Fixed view
Fixed view, dynamic,
Changing data
Yes
Yes
Fixed view
Fixed view, dynamic,
Changing data
Yes
Yes
Yes
Yes (Lambda)
Batch-centric (micro-partition level locking, limit of 20 queued writes per table)
Batch write preferred (1+ minute interval). Requires rewrite of entire micro-partition
Multi-master, lock-free high performance ingestion with unlimited scale for batch or continuous ingestion
Immediately visible during ingestion
None.
Indexes for data access, joins, aggregation
Can only choose warehouse size not node types
Choice of any node size (CPU/RAM/SSD), tuning options
Cost-based optimization, vectorization
Index- and cost-based optimization,vectorization JIT, pushdown optimization
Result cache, materialized view
F3 (cache) (Aggregate index, join index)
Redshift
1st generation
ARCHITECTURE (overall)
Athena
2nd generation

Firebolt
3rd generation
Snowflake
2nd generation
Elasticity - separation of storage and compute
Supported cloud infrastructure
Elasticity - separation of storage and compute
Control - provision different cluster types on same data
Performance
Control - provision different number of nodes
Control - provision different instance types
Write scalability - batch
Write scalability - continuous
Upserts
Use cases
Price
(keep high level)
Cloud
Semi-structured data - native JSON functions within SQL
Semi-structured data - native JSON storage type
Semi-structured data - performance
Data scalability
Indexes
Sorting (REMOVE FOR NOW)
Query optimization - performance
Query prioritization -
Predictability
(HOLD BACK ON ROW FOR A FEW WEEKS)
Storage format
Partitioning
Caching
Reporting
Dashboards
Ad hoc
Operational or customer-facing analytics (high concurrency, continuously updating / streaming data)
Data processing engine (Exports or publishes data)
Data science/ML
Compute
Storage
Transferred data
Supported cloud infrastructure
Tenancy - option for dedicated resources
Only with RA3: 128 RA3 nodes, 8PB of data.
Up to 100 partitions per table, 100 buckets (default)
No specified storage limit. 4XL data warehouse (128 nodes). 16MB max field size
No limit
AWS only
Limited - 20 concurrent queries by default
Autoscale up to 10 warehouses. Limited to 20 DML writes in queue per table.
Unlimited manual scaling
Redshift RA3, Spectrum only
Yes (query engine, no storage)
Yes
Yes
Redshift RA3, Spectrum only
No
Yes (query engine, no storage)
No
Yes
Yes
Yes
Yes
Yes
No
Yes
Yes
Yes
(Limited)
No
N/A
N/A (mostly batch-centric storage)
Yes (Lambda)
Yes
Yes (Lambda)
None.
None.
Sparse, aggregate, join indexes
None
Provision different virtual warehouses
Provision different engines, sizes, instance types. Thread-level optimization
N/A. Supports external formats
Optimized storage (S3)
Optimized F3 from storage to RAM
No
Strong
Strong. Multi-master parallel batch
Limited to 20 DML writes in queue per table
Multi-master continuous updates
Yes
1 master cluster
Limited (table-level locking)
Yes (limited)
N/A (slow with columnar)
Yes (slow) micro-partition rewrites
Yes (fast)
Limited
No
Slow (flattens JSON into table)
Slow (full load into RAM, full scan)
Slow (full load into RAM, full scan)
Fast (array operations without full scans)
No
Limited VARIANT (single field)
Yes (Nested array type, compact)
None.
Yes, and materialized views.
Result cache
Yes
Fixed view
Sec-min first-time query performance
Limited continuous writes. Limited concurrency
Unload data as Parquet
Invoke ML (SageMaker) in SQL
Markup on AWS nodes
$0.25-13 per node on demand
Stored data only.
RA3: $24 per TB per monthS3: AWS S3 costs.
Spectrum: $5 per TB scanned (10MB min per query)
AWS only
Isolated or multi-tenant (including Spectrum)
Multi-tenant / pooled resources
Multi-tenant dedicated resources (Only VPS isolated)
Isolated tenancy Compute and storage
AWS only
AWS, Azure, Google Cloud
AWS only
$5 per TB scanned (10MB per query)
None.
None.
Should we be aggressive on Azure, GCP
N/A
(not part of Athena)
All storage.
$23/40 per TB per month on demand/up front
Stored data only.
1TB: free1-5TB: $25K 5-200TB: $80K
200TB+: ask for quote
None.
Markup on AWS nodes $2-$4+ per node
No markup Pay AWS list price
Export query results
Spark, Arrow, Python connectors, integration with ML tools, export query results
Export query results
Exports query results
Export query results or table
Export query results
Limited continuous writes. Limited concurrency
Limited continuous writes. Limited concurrency
Yes. Support continuous updates, Kafka connector
Sec-min first-time query performance
Sec-min first-time query performance
Sub-second first-time query performance
Fixed view
Fixed view
Fixed view, dynamic,
Changing data
Yes
Yes
Yes
None
Result cache, materialized view
F3 (cache) (Aggregate index, join index)
Limited Cost- based optimization
Limited cost- based optimization
Cost-based optimization, vectorization
Index- and cost-based optimization,vectorization JIT, pushdown optimization
Yes.
Yes, and materialized views
Yes, and Aggregate indexes
Workload management (WLM), Short Query Acceleration (SQA)
Native Redshift storage (not Spectrum)
Distribution, sort keys
Partition pruning
Micro-partition / pruning, cluster keys
Sparse indexing

Overall architecture

Snowflake architecture

Snowflake has three major layers in its architecture: storage, compute and cloud services. The storage layer contains all data and is encrypted with separate keys for each customer. Snowflake is by default - for standard, enterprise and business critical editions - multi-tenant.  While customer data is protected with individual customer keys and encrypted at rest, it is in a shared tenancy. Only Virtual Private Snowflake (VPS) offers isolated tenancy per customer in an isolated Snowflake account.

The unit of computing is a virtual warehouse (also called a warehouse), a cluster of compute nodes dedicated to a specific customer. At any time a customer can provision a new virtual warehouse with 1, 2, 4, 8, 16, 32, 64 or 128 nodes where each larger warehouse also has larger individually-sized nodes. Snowflake does not disclose the node details such as the instance type, CPU, RAM, SSD or disk. When you provision a new warehouse, users are assigned to it, and any data needed by those users is loaded as needed. The cloud services are for managing users, access, security and other aspects of Snowflake.

Snowflake has several methods of ingesting data, including Snowpipe for batch ingestion and direct writes. But Snowflake is batch-centric. Snowpipe does not recommend batch intervals more frequent than 1 minute. In addition each table can only have up to 20 queued DML (write) statements, and micro-partitions, the standard block of storage, is an immutable columnar format that must be rewritten with each individual write. These limitations mean Snowflake is less suitable for continuous ingestion.

Snowflake’s security is effectively on par with the security of other offerings, including Firebolt’s. Both encrypt data at rest, secure network connections, provide firewall protection that includes whitelist/blacklist level control, and provide role based access control (RBAC).

Firebolt architecture

Firebolt is a combination of a multi-tenant service used by all customers to manage their deployments, and customer deployments that are their own isolated tenants for storage and compute. The multi-tenant services are for administration, storing metadata about deployments, and security, all similar to Snowflake’s.

With each isolated tenant, customers can start up dedicated compute clusters, called engines, by choosing a number of nodes, from 1 to 128 nodes, and just about any type of any size node with any number. Administrators can then assign any combination of users and specific databases to multiple engines, and run different workloads on them. For example, you can have one cluster doing ingestion/ETL for a few databases, and another cluster that queries the same databases. As with Snowflake, you can resize an engine at any time. There is no limit to the number of engines you can run, and no physical limit to database size.

Similar to Snowflake, Firebolt will pull data into the local engine as needed. Unlike Snowflake, the Firebolt File Format (F3), pronounced “F3” or “TripleF” is optimized to support both ingestion, and improve query performance. F3 as a data access layer manages data across S3 storage, engines and RAM. During ingestion, any number of Firebolt engines can perform batch and continuous ingestion at any scale. Engine nodes ingest data in parallel, with formats ranging from Parquet, ORC or AVRO to JSON. Firebolt has implemented a multi-master write architecture, which means any node can perform any write. As the data is ingested, it is segmented, optimized, and written to F3 storage using operating-system-level drivers designed for sparse storage. The moment data is ingested into RAM it is visible in queries, and a query always returns the latest data.

Firebolt also manages and stores indexes with the data. This includes sparse indexes for accessing data, aggregating indexing to accelerate dimensional analytics, and join indexes, which deliver similar performance to materialized views of joins without having to store and update the view. Throughout ingestion, all indexes are continuously updated. Both indexes and data are fetched from storage and cached locally for performance.

Scalability

Snowflake scalability

Overall Snowflake provides solid elastic scalability, with a few exceptions. The unit of compute scalability with Snowflake is a warehouse. The only way to scale for query size or complexity is to scale up a warehouse, where each larger size increases the size of each node instance incrementally and doubles the number of nodes. This can make scaling to handle increased query size and complexity very expensive (more in the cost section.) When you increase or decrease the size of your warehouse, Snowflake provisions the new warehouse (usually within minutes) and moves users to it. Any data needed is loaded as needed from storage.

The way to scale users is to add more warehouses. You have a choice to manually assign different users to different warehouses (of different sizes), or to use the multi-cluster option available with enterprise and higher editions. Multi-cluster lets you choose a minimum and maximum number of warehouses from 1-10. You can configure Snowflake to automatically add another warehouse of the same size if it detects any queuing of queries (standard), or queuing greater than 6 minutes (economy.)  Snowflake will also automatically load balance, start up, and shut down additional warehouses when these conditions are no longer met.

While this approach automates scaling, you can suddenly find yourself paying for an entire new warehouse to support a single extra user, which can be very expensive if performance requires large warehouse sizes.
In addition, Snowflake is better suited for batch-based ingestion. Its current limitations of 1 minute minimum intervals for Snowpipe, 20 queued writes per table, and micro-partition level locking (because the entire micro-partition needs to be rewritten with each write), limit continuous ingestion scalability.

Firebolt scalability

Firebolt has a similar architecture in that you can assign any users to any engines, which are the equivalent of warehouses in Snowflake. It not only scales for batch, but supports continuous ingestion as well (see the architecture section.) Firebolt does not currently provide auto-scaling, though it is planned and can be mostly automated through scripts today to start resources and can be configured to automatically stop unused resources. As with Snowflake, there is no limit in user scaling; you can provision any number of warehouses of any node size up to 128 nodes. Where Firebolt shines is in more efficient scaling from features that include the ability to choose any EC instance types, continuous ingestion, sparse indexing and F3 storage that lead to more efficient data access, and semi-structured data support. 

Performance

Snowflake performance

While Snowflake was a true innovator in providing elastic scale, it did not significantly improve performance. In fact, Snowflake is slower than several of the legacy on-premises data warehouses it has been replacing over time. This is in part because Snowflake has focused on simplicity of administration, and not focused on performance optimization or tuning options.

With Snowflake you do not have a lot of options to improve performance. You do not know and cannot choose the size of individual nodes, for example. You can either increase the size of your warehouse, add materialized views, or leverage cluster keys.

Improving query performance for complex queries, large data sets and semi-structured data is a major challenge. They all require nodes that can hold all the required data in RAM to deliver fast performance. Otherwise, as a node runs out of RAM it will start to spill data from RAM to disk (as virtual memory), and this paging will dramatically slow down query performance. While you cannot see the exact instance size, each larger warehouse size provides a slightly larger node size as well. So you can increase the warehouse size until you have large enough nodes. But that means doubling the cost of the warehouse and the number of nodes with each increase until you reach a large enough node that can hold the data in RAM.

You also need to understand the difference between first-time query, and repetitive query performance. Snowflake is used primarily to support more traditional reporting and dashboard-based applications to the cloud. It has a tiered caching architecture that performs well when the same queries are performed many times. But the architecture does not support ad hoc well because a first-time query will easily take tens of seconds to minutes, and many ad hoc queries are first-time queries.

The first time data is needed, data is transferred from remote storage into the virtual warehouse local cache storage such as SSD. A sizable query, according to FiveTran and other benchmarks, can easily take tens of seconds to minutes. The query result is then stored in the result cache. Once all of the data is stored in the local warehouse cache, query times can deliver 10x faster, or second-level performance. If the exact query result already exists in the result cache, the query can easily return the result 10x faster than the local disk cache with sub-second performance. But it requires the query to be the same and the original data to be unchanged.

The other way to improve performance is by using materialized views, which provide an up-to-date result of a query. This can significantly improve the performance of complex queries, including queries against semi-structured data. But given the additional costs in compute and storage, it is only useful for repeated queries against slowly changing data. They are also limited in the SQL they support, and can only be used with a single table. They do not support joins.

Snowflake has added some other performance optimizations. Some, such as query vectorization, are becoming more common. A slightly more unique optimization in Snowflake is the combination of micro-partitions and cluster keys. Micro-partitions are Snowflake’s contiguous units of columnar storage. They vary in size from 50-500MB, in part to support updates. Whenever an update happens, the entire partition must be re-written because a micro-partition is immutable. But this is done transparently. Data ranges are maintained for each micro-partition to help with pruning of micro-partitions during queries to improve performance. In addition to manual sorting, you can also choose a subset of columns as a cluster key. Snowflake will automatically cluster data within micro-partitions based on the composite key. Cluster keys help improve columnar compression by clustering similar values together, in addition to improving query performance through the pruning of unneeded ranges of micro-partitions.

Firebolt performance

Firebolt’s biggest innovations are in performance and price-performance. With elastic scalability, where doubling the resources for longer queries can cut query times in half, it is important to measure price-performance (performance gain X price advantage.) In customer benchmarks, Firebolt delivered 4-6000x faster performance than Snowflake across a wide range of queries and data sets ranging from gigabytes to petabytes. Compared to Snowflake, customers have seen an order of magnitude performance improvement. One customer saw a 30x price-performance advantage with 3x the performance on  a cluster with 10x lower cost.

The core of Firebolt’s innovations are around the combination of its storage, indexing and query engine, optimized together for performance as well as elastic scalability. Its query engine is written in C++ for fast (sub-second) and predictable (low standard deviation) query performance.

For storage optimization, F3 provides a unified data access layer that transparently manages data caching and access across the tiered data layers from local cache to decoupled data storage. The query optimizer leverages sparse indexing and any join or aggregating indexes to identify the location of data and compile query plans just in time (JIT) using cost-based optimization. This includes reordering query plans based on getting results from indexes or caches instead of remote storage, and performing pushdown optimization. Other query engine optimizations include query vectorization, advanced functions and native data type support.

Firebolt also provides indexing. While Snowflake does keep track of data ranges in micro-partitions (and cluster keys) to help prune micro-partitions out as a way to improve performance, it does not provide any indexing. It relies on columnar storage for speed. Firebolt provides sparse indexing for much more granular data pruning, aggregating indexes for dimensional analytics, and join indexes that provide similar performance to precomputed materialized joins without the overhead of storage and updates. Snowflake has no equivalent since its materialized views do not support joins or a broad range of aggregation functions.

Firebolt also allows you to choose any instance type for a node, and any number of nodes. It means you can achieve the best balance of CPU, RAM, SSD, disk and scaleout for the best combination of price-performance. For example, you can choose a small number of very large nodes to support complex queries, versus having to go up to a 128 node Snowflake cluster just to increase the node size.

Another key advantage is Firebolt’s native support for JSON. Snowflake stores JSON as raw text in a VARIANT column. While it creates and stores metadata to help process the JSON, Snowflake has to load all the JSON into RAM first, and then perform full scans for processing. This ends up being very expensive, because you need to choose large cluster sizes to get larger enough nodes that can fit all the JSON in RAM, and slow because you need to perform full scans. Firebolt allows any combination of flattening JSON or storing it natively as a nested array structure. You can UNNEST any data, and load JSON natively entirely within SQL using a few commands. The preserved JSON can then be queried within SQL using native Lambda-style functions. When used, not only is the data stored in an efficient fashion where individual elements can be compressed. Operations can be performed without full scans.

Firebolt has also added capabilities that make it much better suited for continuous ingestion-based workloads compared to Snowflake.  Multi-master lock-free ingestion, combined with the ability to see data in RAM the moment it is ingested means Firebolt can support low latency streaming use cases far better than Snowflake.

Cost

Snowflake cost

Snowflake simplifies administration, but the cost of inefficient scaling and performance, and its pricing far outweigh the benefits for those that can manage and tune clusters.

Snowflake offers 4 editions: standard, enterprise, business critical and Virtual Private Snowflake (VPS) with $2, $3, $4 and undisclosed (higher) pricing respectively per credit. The closest edition to Firebolt is VPS since business-critical adds encryption, PrivateLink and failover, and VPS adds tenant isolation.
You can think of Snowflake pricing as a markup on the storage and computing it sells you. A credit is a node compute-hour. A business critical 128 node warehouse will cost you $512 an hour, or $4.49 million list price for 8,760 hours a year in compute. Storage is cheap at $23 per month for prepaid or $40 per month for on-demand storage, but you do get charged for other data needs such as staging data. A petabyte costs $276K annually at list price.

Snowflake does charge for other services in addition to a virtual warehouse. Snowpipe, which is used for batch data ingestion, does not require a warehouse to be running but does charge for the compute resources and a fixed price per file.  Database replication costs are similar: you are charged for compute resources per second along with storage and data transfer. So are materialized view costs.

The biggest challenge with Snowflake is that your main option for improving performance (see the Performance section) is scaling up your warehouse. A query can only run within a single warehouse, so the only way to partition the work is to grow the warehouse size, which doubles both the warehouse size and the cost each size up. If you need to improve individual node performance, you have to keep scaling your warehouse to incrementally grow the node size until your node is big enough. Otherwise RAM will start to spill over to disk, and performance will start to drop fast.

The other big challenge is that Snowflake charges a big markup on any size node. for  compute. The wrong query at the wrong time can cost a fortune in credits. Snowflake has provided extensive controls for limiting consumption. But the downside is that compute-based pricing makes people avoid running queries that may be valuable to the business.

Charging based on computing also gives Snowflake the wrong incentive. Any optimization that gets more performance out of the same nodes saves you money.  But Snowflake would lose money since you would buy less compute, which is by far their biggest source of revenue.

Firebolt cost

We tested Firebolt on a 39B record dataset and saw a 90% cost reduction with a 3x performance uplift vs Snowflake

Head of Data in a marketing tech company

Firebolt has repeatedly delivered 10x or greater price-performance through a combination of greater compute efficiency per node, choice of resources for each workload, and ability to optimize those resources. Firebolt also puts a focus on simplifying administration by automating several of the more complicated tasks. For example, you can simply resize an engine with a few clicks, and Firebolt handles all the reprovisioning. It automatically partitions data across differently sized nodes, and configures and updates indexes. It also automatically rebalances data in F3 over time.

Firebolt does expose more tuning options, which does require someone to spend time selecting different instance types, or configuring indexing in the administration console. But this is comparable to the time administrators might spend in Snowflake trying to improve performance without having the control to do so. Snowflake administrators often analyze performance using the Query Profile to look at the time spent on operations before they decide to rewrite a query, or to increase and decrease the size of a cluster based on disk spillage as an indicator of available RAM in the cluster (since they do not know the actual size of the instance type.)

The big differences in cost are in the total cost of compute, and the related pricing. In terms of compute costs, Firebolt is 10x more efficient per node. It is much more efficient with CPU and RAM by leveraging query optimizations including indexing and native semi-structured  data storage. It is more efficient with SSD by only storing the needed data instead of fetching and caching entire micro-partitions. It also allows you to choose the best instances and combination of CPU, RAM, SSD and disk to optimize price-performance for each type of ingestion and query workload. You can choose to have a small cluster with massive nodes, or a massive cluster with small nodes depending on the type of computing. Both are much less expensive than Snowflake.

In terms of pricing, Firebolt is more transparent. Customers see the exact price as the choose any Amazon on-demand or spot instances. The ability to choose spot instances can result in 20-30% of the cost of on-demand nodes.

In a recent PoC, a Firebolt client compared Firebolt vs Snowflake across 5 real-world analytical queries over a 0.5 TB data set:

Query
Duration
Duration
Perf.boost
$1.54 / hour
Engine size:
1 x c5d.4xlarge
$16 / hour
Warehouse size:
Large
#5
#1
#2
#3
#4
0.09 sec
72 sec
62 sec
90 sec
60 sec
1.88 sec
0.01 sec
0.16 sec
X388
X800
X48
X6000
83 sec
1.6 sec
X52

The combination of 10x greater efficiency per node, and choice of instance types and number for each engine has enabled companies using Firebolt to deliver the same computing at 10x lower costs than with Snowflake. Users can run more queries, and get more value out of the data as a result because they no longer limit their usage for fear of blowing the budget.

Analytics use cases

Over the last decade, not only has the volume, variety and velocity of data changed, so has the use of analytics and data. Until recently, the most common use of data warehouses involved analysts and managers using reports and dashboards to analyze historical data. The data was typically extracted, loaded, and transformed (ETL) from applications into data warehouses. 

Two big trends have completely changed analytics. The first was a shift away from centralized decision making by analysts and managers to real-time decisions by employees and customers, which requires self-service analytics that can analyze historical and near real-time information.

The second change was the explosive growth of Big Data, in part to support more real-time decisions. In 1992, Walmart was the first to reach a 1 terabyte (TB) data warehouse. Ten years ago, some data warehouses reached 1 petabyte (PB). This growth has been driven by the explosive growth of newer types of data - including streams of data about connected customers, devices and applications.

Today most companies have the following types of analytics

There could be a host of different operational analytics systems from IT monitoring and network telemetry, to customer-facing analytics that a company sells to their end customers about their products, anything from financial assets, advertisements, automobiles to sports or games. Some operational analytics are even delivered as a (customer-facing) service by SaaS vendors.

Snowflake analytics use cases

Snowflake is by design a data warehouse as a service. It was created nearly a decade ago to help companies move existing traditional data warehouse workloads into the cloud. In short, that means Snowflake supports BI reporting and daily dashboard use cases really well. It has enabled companies to move these traditional analytics workloads into the cloud, and outsource their data warehouse infrastructure and infrastructure management. 

But Snowflake is not good for:

It can also be a very costly solution to use for high concurrency (user and query) workloads given its costs as it scales (see the cost section.)

This means that while Snowflake is better suited for traditional analytics, it is not suited for ad hoc, big data, operational and customer-facing analytics, the same workloads that helped push existing data warehouses beyond their limits the past decade.

Firebolt analytics use cases

Firebolt is a 3rd generation data warehouse, built over the last few years, that is by design meant to address several of these more recent analytics challenges and use cases. Firebolt is designed for:

While Firebolt can address reporting and dashboard use cases well, where it shines relative to Snowflake is in the other use cases.

Conclusion

While Snowflake decoupled storage and computing, which in turn simplified scalability and administration, and also helped move traditional data warehouse workloads to the cloud, it has not addressed the newer analytics needs driven by the rise of big data and the need for real-time responsiveness in businesses today. 

The promise of the cloud has always been to bring not just lower costs, but also the latest innovations into companies. 3rd generation data warehouses like Firebolt have added innovations that improve performance and cost by an order of magnitude. Now companies can support true ad hoc, high performance, big data, streaming analytics, operational and customer-facing analytics at scale.

Taking advantage of Firebolt does not mean you have to replace Snowflake. Companies that already used Snowflake simply added Firebolt as another cloud data warehouse for these newer use cases where Snowflake is not working for them. Today’s modern data pipelines and data lakes have made adding another cloud warehouse relatively straightforward.

About Firebolt

Firebolt is the world’s fastest cloud data warehouse, purpose-built for high performance analytics. It provides orders of magnitude faster query performance at a fraction of the cost of the alternatives by combining the simplicity, elasticity and low cost of the cloud with the latest innovations in analytics. Companies that adopted Firebolt have been able to deploy data warehouses in weeks and deliver sub-second performance at terabyte to petabyte scale for a wide range of interactive, high performance analytics across internal BI as well as customer facing analytics use cases.