Snowflake is a great 2nd generation cloud data warehouse. It was one of the first cloud data warehouse vendors to separate storage and compute for elastic scalability, and it simplified administration even more.
Firebolt took several of the ideas in Snowflake and added a focus on improving performance at scale while lowering costs. Our customers demonstrate 3x the performance at 1/10th the cost of Snowflake, or a 30x price-performance advantage.
Firebolt is a full-fledged warehouse and stands up well in a full face off. That said we do have existing Snowflake customers who decide to run both and move specific workloads - ad hoc analytics, sub-second SLAs, semi-structured (JSON) data, or just queries for massive data sets to Firebolt - because those are the most expensive workloads in Snowflake. They focused on the biggest savings first.
Some of the key differences:
- Indexing: Snowflake does prune partitions during queries using data ranges. But it does not use indexing, which is critical to improving performance. Firebolt uses sparse, aggregate and join indexes that are optimized with the query engine and storage to improve query performance and support continuous and batch updates.
- Query optimization: Snowflake supports query vectorization and does some cost-based optimization. But its first-time run of queries are typically seconds-to-minutes. Snowflake has added local disk “caching” and also a result cache to speed up subsequent queries for repetitive workloads like reporting and dashboards. But it does not work well for ad hoc interactive queries. Firebolt was designed to deliver sub-second performance the first time. Features include vectorized processing, JIT compilation, cost-based optimization, indexing, the F3 storage which manages data from RAM to disk, and a host of tuning options to improve query performance and prioritization.
Pricing: Snowflake pricing is based on the amount of compute and storage they sell you. They have no incentive to improve the performance or efficiency of each node. Firebolt pricing is based only on the amount of your data in Firebolt. We decided not to charge a markup on AWS computing or storage. This lets us focus on optimizing Firebolt to get the most out of each node and storage.
Control enabling choice of engine and node types: With Snowflake you can only choose 1, 2, 4, 8, 16, 32, 64, or 128 nodes. The nodes grow larger, but you don’t know the details and have no other choices. Firebolt lets you choose almost any instance type, size and number of nodes for each engine, which is similar to a virtual warehouse. You can provision an engine as a small number of very large AWS instance types, for example.
Semi-structured data: Snowflake puts semi-structured data into a VARIANT field. While it does build metadata to help with processing, all the JSON for a node must be fully loaded into the available RAM and the engine performs full scans. This is slow if there is enough RAM, and becomes dramatically slower if it spills over into disk. Your only option is to grow the node size by increasing the warehouse size, which must double in the number of nodes and cost with each jump up, until you have enough RAM. Firebolt has native nested storage and Lambda expressions you can use in SQL that are efficient in storage, do not not require loads into RAM or full scans, and are optimized for speed for much faster performance with much fewer resources.
Continuous updates: Snowflake uses partition-level locking and has a limit of 20 DML writes in the queue per table that restricts its ability to support continuous updates from Kafka and other streaming technologies. Firebolt is one of the few data warehouses that can support continuous updates at any scale by enabling multi-master writes without table or partition locking across instances directly into F3.
Optimized storage: Snowflake has a (micro-)partition file system that is more optimized than S3 and supports partitioning and sorting with cluster keys. But each write requires a rewrite of the partition. Partly for this reason, Snowflake does not support continuous writes. The Firebolt File Format (F3) spans multiple tiers from RAM during ingestion to SSD and disk. It automatically rebalances, supports multi-master continuous ingestion, single-row inserts, upserts and deletes with immediate query visibility. It is also optimized with multiple types of indexing for performance down to the OS level.