Data Warehousing on AWS

Amazon Web Services (AWS) provides a wide range of services that more than 1 million customers benefit from worldwide. AWS infrastructure services such as Elastic Compute Cloud (EC2), object storage (S3), and others are fundamental to building various analytics services such as Redshift. 

Amazon’s Redshift was one of the first cloud-based data warehouses, leveraging AWS’ top infrastructure resources. Like most first-generation cloud data warehouses, Redshift provided an array of exciting new benefits. Redshift leverages architectural improvements including de-coupled compute and storage, autoscaling, auto-pause/resume, and workload isolation capabilities to deliver simple, scalable, and performant data warehouses on AWS.

Since the introduction of Redshift, new data warehouse offerings, such as Snowflake and Firebolt, have also emerged on the AWS Marketplace.

Data Management and Analytics Services on AWS

AWS has a variety of data management engines and tools that can help organizations. The following are not inclusive of all services for each category. 

  • Managed database and storage: Amazon S3, DynamoDB, Kinesis, and Relational Database Service 
  • Event processing: Lambda, Kinesis Data Firehose, Glue, Kinesis Data Streams and Analytics
  • Data processing: Amazon EMR (previously known as Elastic MapReduce) and AWS Data Pipeline 
  • Data analytics and business intelligence: Amazon QuickSight and Athena 

Partner offerings augment the experience of cloud data warehousing on AWS. These include Fivetran and Matillion, which are data integration and ingestion solutions; dbt, a platform-independent data transformation tool; and Tableau and Looker, which are business intelligence software. These are just a few of the add-ons that organizations can leverage while using AWS.

Getting Started with a Data Warehouse on AWS

The first step to getting started with a data warehouse on AWS is to review and choose the right AWS solution that can furnish specific requirements. It’s important to find a tool that can handle your required size, budget, and query complexity. Amazon Redshift, Firebolt, and Snowflake are optimal solutions for structured, semi-structured, and unstructured data. Amazon Athena is a potential solution for use cases where data stored in Amazon S3 needs to be analyzed interactively using SQL. Firebolt, meanwhile, is the perfect option for high-performance analytics on large datasets. Signing up for various data warehouse services directly from the console or AWS Marketplace is well documented. These options provide the ability to configure the service to suit your requirements. While workflows for each offering might be different, these options are typically straightforward and can help get started rapidly while deploying a minimally viable product. However, laying the groundwork of security, governance and cloud connectivity typically requires time and expertise to ensure that the deployment meets Enterprise requirements. 

Once you’ve set up an AWS data warehouse, the next critical step involves loading data into it. Numerous tools are available to do this, including Amazon S3 (an object storage system), AWS Glue (an ETL service), and a range of third-party tools such as Talend, Informatica, Apache Nifi, and Firebolt's built-in capabilities.

Finally, you’ll need to analyze the data loaded into the AWS warehouse. The most useful supporting tools for this include SQL clients like SQL Workbench/J, DBeaver, Tableau, or Firebolt’s own SQL editor. Others include serverless AWS offerings such as Athena, EMR, and OpenSearch Service (a fork of Elasticsearch that has been maintained by Amazon).

Looker, QlikView, Quicksight, and Tableau are powerful BI tools that can help design dashboards and data visualizations.

Scalability and customization remain key selling points for cloud data warehousing on AWS. All in all, AWS’ data warehouse capabilities provide organizations with one of the most powerful springboards from which a business can leap into a data-driven future.

Next: Optimizing Data Warehouse Deployments

Continue reading
Send me as pdf