What is a cloud data warehouse

Cloud Data Warehouse - Definition

What is a cloud data warehouse


A cloud data warehouse is any data warehouse-as-a-service. It could be a version of an on-premises data warehouse hosted as a managed service, like Vertica or Teradata. It could be a cloud service built using on premises technology, such as RedShift, which was built using ParAccel. Or it could be a cloud-native data warehouse, a data warehouse as a service completely built from the ground up to be a cloud-native service, such as Google BigQuery, Snowflake, or Firebolt.

There are also query engines such as Amazon Athena. While a cloud data warehouse is both storage and compute, a query engine is just the compute. They are a viable alternative whenever you cannot move the data into the cloud data warehouse, and performance is not as important. But a well-optimized federated query engine will not be as fast as a well optimized data warehouse.

Challenges

Beyond the standard challenges with a data warehouse (ADD LINK TO DATA WAREHOUSE), the biggest challenge with a cloud data warehouse is having your data in the cloud, outside the business. You may have some regulatory requirements that prevent cloud adoption for certain types of data. 

But if you are not adopting cloud services because of security concerns, you are in trouble, because you are already using the cloud for something. It means you are already exposed. You need to rethink your security both to protect yourself now and to be ready to adopt cloud technologies the right way. Cloud security is a well-understood problem now. It has been solved.

If you are worried about moving a lot of data when you load or query the data warehouse, those problems have also been solved. It just requires some research.

You do need to be ready now, so make it a top priority now. Architect your security for the cloud, and engineer data movement for hybrid cloud, now.

Benefits

One technical benefit of modern cloud data warehouses is true elastic scalability from a decoupled storage and compute architecture. With it you can handle any size data, and any number of queries and users.

You get many other technical benefits as well. Cloud data warehouses are where most of the innovation is happening with analytics, including various integrations with the latest data integration technologies, BI tools, and machine and deep learning technologies.

Cloud data warehouses are also helping expand the reach of analytics to employees and your customers. Some cloud data warehouses like Firebolt, also improve performance and price-performance, not just scalability. Queries that run in 1 second or less, lower costs, and support for thousands of concurrent users means analytics can be used by the masses.

Another big benefit is that cloud data warehouses make it easier to replace a data warehouse at any time. Many data pipelines now use a data lake as the batch loading source for the cloud data warehouse, and a SQL-based ELT process for loading the data warehouse. Once you have that, you can easily add or replace any analytics engine in days or weeks.

Cloud data warehouse vs cloud query engine

The biggest decision you need to make with a cloud data warehouse is what type to use:

  • A more traditional cloud data warehouse like Redshift, which does not decouple storage and compute.
  • A federated query engine like Presto, where you leave your data in place and query remotely. With Presto, you manage it on your own.
  • A modern cloud data warehouse like Snowflake or Firebolt, which does decouple storage and compute, and where you choose your own dedicated resources.
  • A serverless data warehouse like BigQuery, which provisions its resources dynamically, often from shared resources
  • A serverless query engine like Athena, which is basically Presto as a service, Redshift Spectrum, which is virtual private Athena for Redshift users.

That’s a longer conversation.

Firebolt data warehouse

Firebolt provides the fastest cloud data warehouse with the performance to support ad hoc and high-performance analytics at scale, as well as semi-structured data analytics, not just traditional reporting and dashboards. Its modern decoupled storage and compute not only improves scalability and administration. It also allows companies to easily support high user concurrency.