We often get asked “what’s the difference between Firebolt and Snowflake?” and it reminds me of Frozen. Now, I am not as good as Elsa from Frozen singing “Let it Snow.” If I were, I probably would not be writing this blog. But I do think about how to explain when to “Let Snow Go.” When does a “Fast Firebolt” replace “Slow Snow” for a workload, and more importantly, why is Slowflake slower?
(Please note: this decision is not really about whether to replace Snowflake. The decision is more whether to have multiple analytics engines and when to run certain workloads on something other than Snowflake.)
If you are an existing Snowflake customer - or if you are using Redshift, Athena, Azure Synapse or Google BigQuery for that matter - you probably know this. Snowflake is good for traditional data warehouse workloads such as reporting and dashboards. That is what Snowflake was built for; to move traditional batch-based reporting and dashboard-based analytics to the cloud.
But Snowflake is not good for:
- Ad hoc analytics or anything requiring sub-second, first-time query performance
- Large complex queries against massive data sets
- Semi-structured data queries
- Streaming analytics or continuous ingestion using technologies like Kafka
Snowflake has limitations, like all the other 1st and 2nd generation cloud data warehouses, that either make Snowflake not fast enough, too expensive, or both. Here’s a summary of the biggest challenges:
- Batch-based ingestion: Traditional data warehouses have always been batch-oriented for a variety of reasons. When Snowflake targeted traditional data warehouse use cases, they kept that limitation. Snowlake locks an entire partition with each write, and limits write queues to 20 writes per table. Snowpipe and other loading mechanisms tend to batch in intervals of 1 minute or more. It is not designed for continuous ingestion.
- Query performance: The first time data is needed, Snowflake pulls data from storage into a virtual warehouse and puts it into local storage. Once a query is executed, the results are stored in a cache. While this speeds up executing the same query, or executing different queries against the same data, the first query is slow. According to the latest FiveTran benchmarks, for example, querying 1 terabyte with an 8 node virtual warehouse that costs $16-32 an hour, which is $150,000-$300,000 annually, takes 8 seconds on average to perform the queries, and as much as minutes.
- Semi-structured data: Snowflake can ingest JSON, capture metadata about it, store it directly in a VARIANT field (column) and process the JSON directly using native functionality. But it is slow. Whenever Snowflake executes queries or calculations, it has to load all the JSON into available RAM first, and then do full scans to find specific fields. To get enough RAM, you need to keep doubling the size of your Snowflake virtual warehouse, which grows each node a little, until each individual node is big enough. Otherwise the JSON will cause data to spill over to disk, which makes performance plummet.
- Cost: Perhaps you have heard of the term “credit fever,” a story of a query that broke the budget, or rules like killing a Snowflake process that runs for more than x minutes. Snowflake charges for compute. If multi-cluster auto scaling is on, the wrong queries can devour credits. High performance or semi-structured data workloads can also consume credits because each larger warehouse size doubles the cost.
Firebolt, as a 3rd generation cloud data warehouse, added specific analytics innovations on top of the best of Snowflake - including innovations in storage, indexing and query optimization - to improve performance and cost. The combination has made Firebolt much better and more cost effective for ad hoc interactive, high performance, semi-structured data, and operational or customer-facing analytics that require continuous ingestion.
Some key differences from the more detailed comparison between Firebolt and Snowflake:
- Performance: Firebolt has been up to 182x faster than any alternatives. One customer achieved 3x faster performance and 10x lower cost, or a 30x price-performance advantage compared to their Snowflake deployment. The demo that Firebolt shows in its product showdown shows a first-time query executing in roughly 1.3 seconds on an 8 node cluster that costs roughly $1.7 an hour … against over 20 terabytes and 42 billion rows of data. That is roughly a 250x scale-price-performance advantage compared to the FiveTran benchmark.
- Semi-structured data: Firebolt stores JSON natively in a nested structure and provides native Lambda functionality within compliant SQL that does not require full scans or loading the data into RAM. The performance advantage is even greater in this case.
- Cost: Firebolt lets you choose your AWS instance types and number of nodes for each cluster. You can have 4 massive nodes if you choose. With Firebolt, each node can deliver 10x greater efficiency, and you see the price of each node type as you select it, giving you the ability to choose the best price-performance. This combination of greater control and choice over resources, greater efficiency per node, and different pricing is what leads to a 10-100x lower total cost.