Redshift has the oldest architecture, being the first Cloud DW in the group. Its architecture wasn’t designed to separate storage & compute. While it now has RA3 nodes which allow you to scale compute and only cache the data you need locally, all compute still operates together. You cannot separate and isolate different workloads over the same data, which puts it behind other decoupled storage/compute architectures. Redshift runs as an isolated tenant per customer, and unlike other cloud data warehouses, it is deployed in your VPC.
Firebolt is built on a natively decoupled storage & compute architecture, on AWS only. Data has to be copied outside of your VPC into the Firebolt, where both your compute and data run in a dedicated and isolated tenant. A “Firebolt Engine” can be granularly configured across # of nodes and different CPU/RAM/SSD combinations.
Redshift is limited in scale because even with RA3, it cannot distribute different workloads across clusters. While it can scale to up to 10 clusters automatically to support query concurrency, it can only handle a maximum of 50 queued queries across all clusters by default.
Firebolt can handle the largest data volumes and concurrency on a single comparable cluster size, thanks to its superior hardware efficiency. Thanks to its decoupled storage & compute architecture it scales very well to large data volumes. However, resizing an engine size isn’t instant and requires orchestration if avoiding downtime is necessary. A single Firebolt engine can support hundreds of concurrent queries, avoiding the need to scale out for most use cases. Scaling horizontally for even higher concurrency is manual.
Redshift does provide a result cache for accelerating repetitive query workloads and also has more tuning options than some others. But it does not deliver much faster compute performance than other cloud data warehouses in benchmarks. Sort keys can be used to optimize performance, but their contribution is limited. There is no support for indexes, and low-latency analytics at large data volumes is hard to achieve. Because Redshift decoupling of storage & compute is limited compared to other cloud data warehouses, it doesn’t support isolating workloads, which means performance can degrade under pressure and competition for resources.
Firebolt is the fastest when it comes to query performance when compared to cloud data warehouses and services like Athena. Its unique approach to storage and indexing results in highly aggressive data pruning that scans dramatically less data compared to other technologies. While other technologies scan partitions or micro-partitions, Firebolt works with indexed data ranges, that are significantly smaller. In addition, Firebolt lets user accelerate queries further with multiple index types (Aggregating index, Join index), and using its decoupled storage & compute architecture workloads can be easily isolated to guarantee consistent performance.
Redshift was originally designed to support traditional internal BI reporting and dashboard use cases for analysts. As such, it is typically used as a general-purpose Enterprise data warehouse. With deep integrations into the AWS ecosystem, it can also leverage AWS ML service, making it also useful for ML projects. However, given the coupling of storage & compute, and the difficulty in delivering low-latency analytics at scale, it is less suited for operational use cases and customer-facing use cases like Data Apps. The coupling of storage and compute, together with the need to predefine sort & dist keys for optimal performance, make it challenging to use for Ad-Hoc analytics.
Firebolt stands out by being the fastest cloud data warehouse when compared to Snowflake, Redshift, BigQuery and Athena. It’s great for delivering sub-second analytics at scale, while remaining hardware efficient and high concurrency friendly. This makes it a great choice for operational use cases and customer-facing data apps. Given that it is not as feature-rich and integration rich as the more mature data warehouses makes it a lesser fit for a general-purpose Enterprise data warehouse. It is also not the best fit for ad-hoc use cases, because of the need to predefine indexing at the table level.