# Redshift vs Databricks (/comparison/redshift-vs-databricks)


## Architecture [#architecture]

The biggest difference among cloud data warehouses are whether they separate storage and compute, how much they isolate data and compute, and what clouds they can run on.

| Feature                                           | Databricks                                                                                                                                                                                                 | Redshift                                                                                                           | Firebolt                                                                                                                                                                                                                                                                                                                                         |
| ------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| Separation of storage and compute                 | Yes                                                                                                                                                                                                        | RA3 instances enable separation of compute and storage, but limited workload isolation compared to other platforms | Yes, separation of storage and metadata as well as compute from compute with full workload isolation.                                                                                                                                                                                                                                            |
| Supported cloud infrastructure                    | AWS, Azure, GCP. Marketplaces and BYOC                                                                                                                                                                     | AWS only                                                                                                           | AWS (GCP coming soon) & anywhere (Firebolt Core)                                                                                                                                                                                                                                                                                                 |
| Isolated tenancy – option for dedicated resources | • Control plane in Databricks account<br />• Data plane in customer VPC (optional)<br />• Storage in customer VPC<br />• Serverless SQL runs in Databricks account with private connectivity               | • Isolated tenant & resources • Runs in your VPC                                                                   | • Multi-tenant metadata layer<br />• Isolated tenancy for compute & storage per client                                                                                                                                                                                                                                                           |
| Control vs abstraction of compute                 | • Configurable clusters and instance types<br />• Serverless SQL warehouses (GA 2025) run in Databricks account with private connectivity, no public IPs<br />• Pro/Classic warehouses run in customer VPC | • Configurable cluster size<br />• Configurable compute types                                                      | Uses engine abstraction:<br />• Each engine has configurable cluster size (1-128 nodes) for horizontal scaling.<br />• Configurable compute family (compute vs storage optimized) and type (XS, S, M, L, XL) for vertical scaling<br />• Number of clusters for concurrency (auto)scaling.<br />Provides full workload isolation across engines. |
| Self-hosted and hybrid deployment options         | • Databricks on customer cloud accounts<br />• Unity Catalog for hybrid governance                                                                                                                         | Limited hybrid options with Redshift Serverless                                                                    | • Firebolt Core: Forever free, self-hosted edition with full query engine capabilities<br />• Same performance and features as managed service<br />• Deploy anywhere: local laptop, cloud, datacenter, Kubernetes<br />• Production-grade distributed architecture<br />• No usage restrictions except building competing SaaS                  |
| ACID Compliance and Transactions                  | • ACID transactions with Delta Lake<br />• Time travel and versioning<br />• Concurrent read/write operations                                                                                              | ACID compliant at table level with some limitations on concurrent operations                                       | • Full ACID compliance with snapshot isolation<br />• Multi-statement transactions supported<br />• Strong consistency across all operations<br />• Supports concurrent reads and writes<br />• Transactional integrity for data applications                                                                                                    |

**Redshift** has the oldest architecture, being the first Cloud DW in the group. Its architecture wasn't designed to separate storage & compute. While it now has RA3 nodes which allow you to scale compute and only cache the data you need locally, all compute still operates together. You cannot separate and isolate different workloads over the same data, which puts it behind other decoupled storage/compute architectures. Redshift runs as an isolated tenant per customer, and unlike other cloud data warehouses, it is deployed in your VPC. Redshift offers a serverless option which is based on an abstracted unit called Redshift Processing Unit (RPU) ranging from 8 to 512 in increments of 8. Each RPU provides 2 vCPU and 16GB RAM. Thus, 8 RPU is equivalent to 16 vCPU / 128GB RAM. The minimum RPU is 8.

**Databricks** was built by the founders of Spark as an analytics platform to support machine learning use cases. It leverages the Spark framework to process data residing in a data lake and is supported on AWS, GCP and Azure. Databricks coined the marketing term "Lakehouse '' architecture to illustrate the unification of data lake and data warehouse use cases. Customers still manage Spark clusters that process data residing in a Delta lake. Conversion of data to Delta Lake format is required to leverage the functionality of Delta Lake. Databricks Sql is a relatively new addition to simplify access to data stored in a data lake.

**Firebolt** is built on a natively decoupled storage & compute architecture, on AWS only. Data has to be copied outside of your VPC into the Firebolt, where both your compute and data run in a dedicated and isolated tenant. A "Firebolt Engine" can be granularly configured across # of nodes and different CPU/RAM/SSD combinations.

## Scalability [#scalability]

There are three big differences among data warehouses and query engines that limit scalability: decoupled storage and compute, dedicated resources, and continuous ingestion.

| Feature                                                         | Databricks                                                                                                                                                                                                                                                                                                                                                        | Redshift                                                                                                                       | Firebolt                                                                                                                                                                                     |
| --------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Elasticity – Scaling for larger data volumes and faster queries | Autoscaling clusters based on workload demand. Serverless SQL warehouses provide near-instant scaling (2-6 seconds startup)                                                                                                                                                                                                                                       | Available via Elastic Resize – slow and limited, downtime required                                                             | Granular cluster resize with node types, number of nodes and number of clusters. Zero downtime.                                                                                              |
| Elasticity – Scaling for higher concurrency                     | • 10 concurrent queries per cluster limit • Scales up to 40 clusters per warehouse (400 total concurrent queries) • Serverless SQL warehouses provide near-instant autoscaling • Pro/Classic warehouses take several minutes to provision new clusters • Real-world performance degradation typically occurs at 50-150 concurrent queries depending on complexity | • 5 concurrent queries per WLM queue by default (up to 8 queues) • Concurrency Scaling enables thousands of concurrent queries | A single engine can handle hundreds of concurrent queries. Engines auto-scale the number of clusters up and down base on resource usage thresholds. Idle engines scale down to zero billing. |

**Redshift** is limited in scale because even with RA3, it cannot distribute different workloads across clusters. While it can scale to up to 10 clusters automatically to support query concurrency, it can only handle a maximum of 50 queued queries across all clusters by default.

**Databricks** allow for autoscaling of clusters based on utilization. Additionally, increasing concurrency associated with a sql endpoint can be accomplished through the addition of clusters. Query concurrency per cluster is maxed at 10. However, scaling with additional clusters for concurrency is possible. Databricks provides a choice of instance types.

**Firebolt** can handle the largest data volumes and concurrency on a single comparable cluster size, thanks to its superior hardware efficiency. Thanks to its decoupled storage & compute architecture it scales very well to large data volumes. However, resizing an engine size isn't instant and requires orchestration if avoiding downtime is necessary. A single Firebolt engine can support hundreds of concurrent queries, avoiding the need to scale out for most use cases. Scaling horizontally for even higher concurrency is manual.

## Performance [#performance]

Performance is the biggest challenge with most data warehouses today.
While decoupled storage and compute architectures improved scalability and simplified administration, for most data warehouses it introduced two bottlenecks; storage, and compute. Most modern cloud data warehouses fetch entire partitions over the network instead of just fetching the specific data needed for each query. While many invest in caching, most do not invest heavily in query optimization. Most vendors also have not improved continuous ingestion or semi-structured data analytics performance, both of which are needed for operational and customer-facing use cases.

| Feature                                                      | Databricks                                                                                                                                                                                                                                                                                                                                                                                           | Redshift                                                                                                                                                                                                  | Firebolt                                                                                                                                                                                                                                                                                       |
| ------------------------------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Indexes                                                      | None                                                                                                                                                                                                                                                                                                                                                                                                 | None                                                                                                                                                                                                      | • Sparse primary indexes<br />• Aggregating indexes<br />• Join indexes<br />• Optimizer driven index usage                                                                                                                                                                                    |
| Compute tuning                                               | Choice of cluster type, node types including SSD-optimized instances. Serverless provides automatic resource allocation with Intelligent Workload Management (IWM)                                                                                                                                                                                                                                   | Choice over number of nodes and their type                                                                                                                                                                | SQL defined engines. Control number of nodes, node family and type per cluster, with one or more clusters per engine. Multiple engines isolate workloads.                                                                                                                                      |
| Storage format                                               | • Delta Lake format with Liquid Clustering (February 2025 – replaces Z-ordering and traditional partitioning) • Cannot use Liquid Clustering alongside Z-ordering on same table • Allows for sorted data in Delta Lake • Requires Optimize to maintain ordering                                                                                                                                      | Columnar & compressed storage (RA3 nodes)                                                                                                                                                                 | Columnar, sorted & compressed & sparsely indexed storage (F3 – Firebolt File Format) with native Apache Iceberg support                                                                                                                                                                        |
| Table-level partition & pruning techniques                   | • Table level partitioning • Liquid Clustering for improved query performance and reduced data skew (February 2025) • Z-ordering (legacy, replaced by Liquid Clustering) • Periodic optimization of storage required                                                                                                                                                                                 | • No table partitions • User-defined distribution & sort keys are used to optimize for speed                                                                                                              | • User-defined table-level partitions are optional.<br />• Data is automatically sorted, compressed and indexed into F3 format.<br />• Pruning at indexed data-range level.                                                                                                                    |
| Result cache                                                 | Multi-layered caching: local in-memory cache per cluster plus remote result cache (serverless only) that persists across all warehouses in workspace                                                                                                                                                                                                                                                 | Yes                                                                                                                                                                                                       | Yes, results and sub-results cache with transactional spoiling.                                                                                                                                                                                                                                |
| Warm cache (SSD)                                             | Yes. Delta cache for data read by queries at file level granularity                                                                                                                                                                                                                                                                                                                                  | Only with RA3 nodes at partition-level granularity                                                                                                                                                        | Yes, at indexed data-range level granularity                                                                                                                                                                                                                                                   |
| Support for semi-structured data & JSON functions within SQL | Yes                                                                                                                                                                                                                                                                                                                                                                                                  | Limited                                                                                                                                                                                                   | Yes, including Lambda expressions and native nested array structures                                                                                                                                                                                                                           |
| Vector Search and AI Capabilities                            | • MLflow integration and Databricks ML platform • Native vector search in Delta Lake (Vector Search) • AI and ML workloads optimized                                                                                                                                                                                                                                                                 | Limited AI capabilities – primarily through integrations                                                                                                                                                  | • Native vector search capabilities and embeddings<br />• MCP Server for AI driven analytics<br />• Natural Language to SQL<br />• SQL based Inference                                                                                                                                         |
| Query Optimizations                                          | • Photon engine (C++ vectorized engine providing 3-8x average speedups, maximum speedups over 10x) • Automated stats collection (January 2025) enables cost-based optimization • Predictive I/O for faster point lookups and data updates • Liquid Clustering (February 2025) • Intelligent Workload Management (IWM) with AI-powered resource allocation • Delta cache • Materialized views support | • Basic query optimizer • Materialized views • Result caching • ANALYZE for table statistics • Workload management (WLM) • Automated materialized views (AutoMV) • AI-driven scaling (Serverless preview) | • Primary indexes, aggregating indexes, join indexes, sparse indexes<br />• Sub-plan result caching<br />• F3 storage format optimization<br />• Automatic query optimizer with aggressive pruning<br />• Late column materialization<br />• Query analysis tools based on execution telemetry |

**Redshift** does provide a result cache for accelerating repetitive query workloads and also has more tuning options than some others. But it does not deliver much faster compute performance than other cloud data warehouses in benchmarks. Sort keys can be used to optimize performance, but their contribution is limited. There is no support for indexes, and low-latency analytics at large data volumes is hard to achieve. Because Redshift decoupling of storage & compute is limited compared to other cloud data warehouses, it doesn't support isolating workloads, which means performance can degrade under pressure and competition for resources.

**Databricks** is designed to leverage the Spark framework for processing large volumes of data. It leverages compressed Parquet files in a Delta Lake. To reduce the amount of data processed, it uses data pruning on partitions and Parquet file metadata. Databricks does not provide any indexes.

**Firebolt** is the fastest when it comes to query performance when compared to cloud data warehouses and services like Athena. Its unique approach to storage and indexing results in highly aggressive data pruning that scans dramatically less data compared to other technologies. While other technologies scan partitions or micro-partitions, Firebolt works with indexed data ranges that are significantly smaller. In addition, Firebolt lets users accelerate queries further with multiple index types (Aggregating index, Join index), and using its decoupled storage & compute architecture workloads can be easily isolated to guarantee consistent performance.

## Use cases [#use-cases]

There are a host of different analytics use cases that can be supported by a data warehouse. Look at your legacy technologies and their workloads, as well as the new possible use cases, and figure out which ones you will need to support in the next few years.

| Feature                                                                      | Databricks                                                                                                                                                                                                                                                                                                                                                   | Redshift                                                                                                                                                                                                                                       | Firebolt                                                                                                                                                                                                                                                                                                                       |
| ---------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| Low-latency dashboards                                                       | • Sub-second to seconds load times at TB+ scale • Enhanced by Photon engine (3-8x average speedups) and Delta cache • Serverless SQL warehouses provide rapid startup (2-6 seconds) • Performance depends on cluster configuration                                                                                                                           | • Seconds to tens of seconds load times at 100s of GB scale • Can achieve faster performance with Concurrency Scaling and proper tuning                                                                                                        | • 120ms query latency at 4000 QPS (FireScale benchmark 2025)<br />• Sub-second performance at TB+ scale with proper indexing<br />• Built for AI-driven analytics, dashboards, and real-time analytic applications                                                                                                             |
| Enterprise BI                                                                | • Strong for data science and ML workloads • Unified analytics platform approach • Growing traditional BI integrations • Serverless SQL warehouses improve accessibility • Delta sharing capabilities                                                                                                                                                        | • Mature and comprehensive Enterprise DW feature set • Extensive integrations with Enterprise BI ecosystem • Strong AWS ecosystem integration                                                                                                  | • Growing ecosystem with focus on modern BI tools<br />• Strong SQL compliance with PostgreSQL<br />• Wire level compatibility drives expansion to PostgreSQL BI and ETL ecosystem                                                                                                                                             |
| Data Apps and AI Applications (Customer-facing low-latency high concurrency) | • 10 concurrent queries per cluster, scaling to 400 total concurrent queries per warehouse • Real-world performance degradation typically occurs at 50-150 concurrent queries depending on workload complexity • Serverless provides near-instant autoscaling • Photon engine delivers 3-8x performance improvements • Strong ML and AI platform integration | • 5 concurrent queries per WLM queue by default (up to 8 queues) • Concurrency Scaling enables thousands of concurrent queries • Seconds-level response times typical • Automatic scaling for burst workloads • Limited AI application support | • 120ms latency at 4000+ QPS proven performance at TB+ scale<br />• Supports hundreds to thousands of concurrent queries on single engine<br />• Price-performance leader (8x better than Snowflake, 18x vs Redshift) • Purpose-built for AI agents and data-intensive applications<br />• Native vector search and embeddings |
| Ad hoc                                                                       | • Excellent for ad-hoc with decoupled storage/compute • Serverless SQL warehouses provide instant provisioning • Intelligent Workload Management handles unpredictable workloads automatically • Strong for exploratory data analysis and ML workloads • Automated stats collection improves query planning                                                  | • Performance dependent on predefined distribution & sort keys • Elastic Resize enables adding compute resources • Typically subset of data loaded for ad-hoc analysis                                                                         | • Excellent performance out-of-the-box with engine optimized for star and snowflake joins and aggregations<br />• Self learning query plan optimizer<br />• Full workload isolation prevents ad-hoc complexity from affecting real-time workloads<br />• Aggregating indexes are automatically used by optimizer               |

**Redshift** was originally designed to support traditional internal BI reporting and dashboard use cases for analysts. As such, it is typically used as a general-purpose Enterprise data warehouse. With deep integrations into the AWS ecosystem, it can also leverage AWS ML service, making it also useful for ML projects. However, given the coupling of storage & compute, and the difficulty in delivering low-latency analytics at scale, it is less suited for operational use cases and customer-facing use cases like Data Apps. The coupling of storage and compute, together with the need to predefine sort & dist keys for optimal performance, make it challenging to use for Ad-Hoc analytics.

**Databricks** is a mature Spark based platform proven for processing streaming data. It is widely used for Machine Learning use cases by data scientists through the use of integrated notebooks. From a low latency query perspective, while it offers features like Delta Cache, it does not provide specialized indexes that can deliver low latency queries.

**Firebolt** stands out by being the fastest cloud data warehouse when compared to Snowflake, Redshift, BigQuery and Athena. It's great for delivering sub-second analytics at scale, while remaining hardware efficient and high concurrency friendly. This makes it a great choice for operational use cases and customer-facing data apps. Given that it is not as feature-rich and integration rich as the more mature data warehouses makes it a lesser fit for a general-purpose Enterprise data warehouse. It is also not the best fit for ad-hoc use cases, because of the need to predefine indexing at the table level.