Databricks was built by the founders of Spark as an analytics platform to support machine learning use cases. It leverages the Spark framework to process data residing in a data lake and is supported on AWS, GCP and Azure. Databricks coined the marketing term “Lakehouse '' architecture to illustrate the unification of data lake and data warehouse use cases. Customers still manage Spark clusters that process data residing in a Delta lake. Conversion of data to Delta Lake format is required to leverage the functionality of Delta Lake. Databricks Sql is a relatively new addition to simplify access to data stored in a data lake.
Firebolt is built on a natively decoupled storage & compute architecture, on AWS only. Data has to be copied outside of your VPC into the Firebolt, where both your compute and data run in a dedicated and isolated tenant. A “Firebolt Engine” can be granularly configured across # of nodes and different CPU/RAM/SSD combinations.
Databricks allow for autoscaling of clusters based on utilization. Additionally, increasing concurrency associated with a sql endpoint can be accomplished through the addition of clusters. Query concurrency per cluster is maxed at 10. However, scaling with additional clusters for concurrency is possible. Databricks provides a choice of instance types.
Firebolt can handle the largest data volumes and concurrency on a single comparable cluster size, thanks to its superior hardware efficiency. Thanks to its decoupled storage & compute architecture it scales very well to large data volumes. However, resizing an engine size isn’t instant and requires orchestration if avoiding downtime is necessary. A single Firebolt engine can support hundreds of concurrent queries, avoiding the need to scale out for most use cases. Scaling horizontally for even higher concurrency is manual.
Databricks is designed to leverage the Spark framework for processing large volumes of data. It leverages compressed Parquet files in a Delta Lake. To reduce the amount of data processed, it uses data pruning on partitions and Parquet file metadata. Databricks does not provide any indexes.
Firebolt is the fastest when it comes to query performance when compared to cloud data warehouses and services like Athena. Its unique approach to storage and indexing results in highly aggressive data pruning that scans dramatically less data compared to other technologies. While other technologies scan partitions or micro-partitions, Firebolt works with indexed data ranges, that are significantly smaller. In addition, Firebolt lets users accelerate queries further with multiple index types (Aggregating index, Join index), and using its decoupled storage & compute architecture workloads can be easily isolated to guarantee consistent performance.
Databricks is a mature Spark based platform proven for processing streaming data. It is widely used for Machine Learning use cases by data scientists through the use of integrated notebooks. From a low latency query perspective, while it offers features like Delta Cache, it does not provide specialized indexes that can deliver low latency queries.
Firebolt stands out by being the fastest cloud data warehouse when compared to Snowflake, Redshift, BigQuery and Athena. It’s great for delivering sub-second analytics at scale, while remaining hardware efficient and high concurrency friendly. This makes it a great choice for operational use cases and customer-facing data apps. Given that it is not as feature-rich and integration rich as the more mature data warehouses makes it a lesser fit for a general-purpose Enterprise data warehouse. It is also not the best fit for ad-hoc use cases, because of the need to predefine indexing at the table level.