Clickhouse was originally developed at Yandex, the Russian search engine, as an OLAP engine for low latency analytics. It was built as an on-premise solution with coupled storage & compute, and a large variety of tuning options in the form of indexes and and merge trees. Clickhouse’s architecture is famous for its focus on performance and low-latency queries. The tradeoff is that it is considered very difficult to work with. SQL support is very limited, and tuning/running it requires significant engineering resources.
Redshift has the oldest architecture, being the first Cloud DW in the group. Its architecture wasn’t designed to separate storage & compute. While it now has RA3 nodes which allow you to scale compute and only cache the data you need locally, all compute still operates together. You cannot separate and isolate different workloads over the same data, which puts it behind other decoupled storage/compute architectures. Redshift runs as an isolated tenant per customer, and unlike other cloud data warehouses, it is deployed in your VPC.
Clickhouse doesn’t offer any dedicated scaling features or mechanisms. While it can deliver linearly scalable performance for some types of queries, scaling itself has to be done manually. Hardware is self-managed in Clickhouse. This means that to scale you would have to provision a cluster and migrate.
Redshift is limited in scale because even with RA3, it cannot distribute different workloads across clusters. While it can scale to up to 10 clusters automatically to support query concurrency, it can only handle a maximum of 50 queued queries across all clusters by default.
Clickhouse is famous for being one of the fastest local runtimes ever built for OLAP workloads. Its columnar storage, compression and indexing capabilities make it a consistent leader in benchmarks. Its lack of support for standard SQL and lack of query optimizer means that it’s less suitable for traditional BI workloads, and more suitable for engineering managed workloads. While fast, it requires a lot of tuning and optimization.
Redshift does provide a result cache for accelerating repetitive query workloads and also has more tuning options than some others. But it does not deliver much faster compute performance than other cloud data warehouses in benchmarks. Sort keys can be used to optimize performance, but their contribution is limited. There is no support for indexes, and low-latency analytics at large data volumes is hard to achieve. Because Redshift decoupling of storage & compute is limited compared to other cloud data warehouses, it doesn’t support isolating workloads, which means performance can degrade under pressure and competition for resources.
Clickhouse was not designed to be a data warehouse, but rather a low-latency query execution runtime. Managing it typically requires significant engineering overhead. Hence, it’s a good fit for engineering managed operational use cases and customer-facing data apps, where low latency matters. It is not a good fit for a general purpose data warehouse, nor for Ad-Hoc analytics or ELT.
Redshift was originally designed to support traditional internal BI reporting and dashboard use cases for analysts. As such, it is typically used as a general-purpose Enterprise data warehouse. With deep integrations into the AWS ecosystem, it can also leverage AWS ML service, making it also useful for ML projects. However, given the coupling of storage & compute, and the difficulty in delivering low-latency analytics at scale, it is less suited for operational use cases and customer-facing use cases like Data Apps. The coupling of storage and compute, together with the need to predefine sort & dist keys for optimal performance, make it challenging to use for Ad-Hoc analytics.