Druid is an OLAP engine designed to provide fast real time analytics. Druid adopts a clustered architecture with servers that host various role specific processes. These processes address real time and batch ingestion, indexing, querying of historical and real time data. Apache Druid can be deployed as a virtual machine or a Kubernetes based cluster. Druid does not support a decoupled compute & storage architecture. Deep storage in the form of object storage is used to replicate data to.
Firebolt is built on a natively decoupled compute & storage architecture, on AWS only. Data has to be copied outside of your VPC into the Firebolt, where both your compute and data run in a dedicated and isolated tenant. A “Firebolt Engine” can be granularly configured across # of nodes and different CPU/RAM/SSD combinations.
Druid provides the ability to handle fast ingest and high concurrency. Custom sizing and cluster tuning are required to balance the compute, memory, storage needs of each process within Druid and to provide high concurrency. Druid clusters can be grown by adding nodes with automatic rebalancing of storage segments assigned to nodes.
Firebolt can handle the largest data volumes and concurrency on a single comparable cluster size, thanks to its superior hardware efficiency. Thanks to its decoupled storage & compute architecture it scales very well to large data volumes. However, resizing an engine size isn’t instant and requires orchestration if avoiding downtime is necessary. A single Firebolt engine can support hundreds of concurrent queries, avoiding the need to scale out for most use cases. Scaling horizontally for even higher concurrency is manual.
Druid provides high performance through columnar storage format, parallel processing, bitmap indexes and roll-ups. Druid, however, recommends a denormalized data model for performance needs. Join operations in Druid are a relatively new feature with various limitations, especially if there is a need to join large datasets.
Firebolt is the fastest when it comes to query performance when compared to cloud data warehouses and services like Athena. Its unique approach to storage and indexing results in highly aggressive data pruning that scans dramatically less data compared to other technologies. While other technologies scan partitions or micro-partitions, Firebolt works with indexed data ranges, that are significantly smaller. In addition, Firebolt lets users accelerate queries further with multiple index types (Aggregating index, Join index), and using its decoupled storage & compute architecture workloads can be easily isolated to guarantee consistent performance.
Druid is designed as an OLAP engine to provide fast access to aggregations that are run against large volumes of data. Druid is typically used for customer facing analytics and streaming data processing. Druid is used as an add-on with other data warehousing products that are efficient at scaling, joining, and filtering large volumes of data. It is not a suitable option for data warehouse replacement.
Firebolt stands out by being the fastest cloud data warehouse when compared to Snowflake, Redshift, BigQuery and Athena. It’s great for delivering sub-second analytics at scale, while remaining hardware efficient and high concurrency friendly. This makes it a great choice for operational use cases and customer-facing data apps. Given that it is not as feature-rich and integration rich as the more mature data warehouses makes it a lesser fit for a general-purpose Enterprise data warehouse. It is also not the best fit for ad-hoc use cases, because of the need to predefine indexing at the table level.