Clickhouse was originally developed at Yandex, the Russian search engine, as an OLAP engine for low latency analytics. It was built as an on-premise solution with coupled storage & compute, and a large variety of tuning options in the form of indexes and and merge trees. Clickhouse’s architecture is famous for its focus on performance and low-latency queries. The tradeoff is that it is considered very difficult to work with. SQL support is very limited, and tuning/running it requires significant engineering resources.
Snowflake was one of the first decoupled storage and compute architectures, making it the first to have nearly unlimited compute scale and workload isolation, and horizontal user scalability. It runs on AWS, Azure and GCP. It is multi-tenant over shared resources in nature and requires you to move data out of your VPC and into the Snowflake cloud. “Virtual Private Snowflake” (VPS) is its highest-priced tier, and can run a dedicated isolated version of Snowflake. Its virtual warehouses can be T-shirt sized along an XS/S/M…/4XL axis, where each discrete T-shirt size is bundled with fixed HW properties that are abstracted from the users.
Clickhouse doesn’t offer any dedicated scaling features or mechanisms. While it can deliver linearly scalable performance for some types of queries, scaling itself has to be done manually. Hardware is self-managed in Clickhouse. This means that to scale you would have to provision a cluster and migrate.
Snowflake scales very well both for data volumes and query concurrency. The decoupled storage/compute architecture supports resizing clusters without downtime, and in addition, supports auto-scaling horizontally for higher query concurrency during peak hours.
Clickhouse is famous for being one of the fastest local runtimes ever built for OLAP workloads. Its columnar storage, compression and indexing capabilities make it a consistent leader in benchmarks. Its lack of support for standard SQL and lack of query optimizer means that it’s less suitable for traditional BI workloads, and more suitable for engineering managed workloads. While fast, it requires a lot of tuning and optimization.
Snowflake typically comes on top for most queries when it comes to performance in public TPC-based benchmarks when compared to BigQuery and Redshift, but only marginally. Its micro partition storage approach effectively scans less data compared to larger partitions. The ability to isolate workloads over the decoupled storage & compute architecture lets you avoid competition for resources compared to multi-tenant shared resource solutions, and the ability to increase warehouse sizes can often enhance performance (for a higher price), but not always linearly. Snowflake’s recently released “Search optimization service” delivers index-like behavior for point queries, but comes at an additional cost.
Clickhouse was not designed to be a data warehouse, but rather a low-latency query execution runtime. Managing it typically requires significant engineering overhead. Hence, it’s a good fit for engineering managed operational use cases and customer-facing data apps, where low latency matters. It is not a good fit for a general purpose data warehouse, nor for Ad-Hoc analytics or ELT.
Snowflake is a well rounded general purpose cloud data warehouse, that can also span beyond traditional BI & Analytics use cases into Ad-Hoc and ML use cases. Thanks to the flexible decoupeld storage & compute architecture that allows you to isolate and control the amount of compute per workload, it’s possible to tackle a broad spectrum of workloads. However, like its close siblings Redshift & BigQuery, it struggles to deliver low-latency query performance at scale, making it a lesser fit for operational use cases and customer-facing data apps.