The Fireblog

Matthew Weingarten from Disney Streaming about Data Quality Best Practices

Principles essential for data quality, cost optimization, and data modeling, as adopted by the world's leading companies

Firebolt Team

Joseph Machado, Senior Data Engineer at LinkedIn talks best practices

Data engineering should be less about the stack and more about best practices.

Firebolt Team

Professors Joe Hellerstein and Joseph Gonzalez on LLMs

Joe Hellerstein and Joseph Gonzalez inspired generations of database enthusiasts and are now on the show

Firebolt Team

Megan Lieu on powerful notebooks that enable collaboration

Megan Lieu about her approach to data advocacy as well as the power of notebooks

Firebolt Team

Transitioning from software engineering to data engineering

Every data team should have at least one data engineer with a software engineering background.

Firebolt Team

The key is in the key.

One of the more common and costly mistakes in the many data implementations is confusion about keys.

Robert Harmon

Simplifying time variance in a SQL data warehouse

An issue many coming into the data warehouse world is difficulty with is managing time variance at scale and efficiency.

Robert Harmon

"Do data architects exist anymore?"

"Do data architects exist anymore?" Wow, as a recovering data architect that's a loaded question.

Robert Harmon

Who's down with OBT? I can assure you, not me.

I'm not a fan of dimensional modeling. It exists to solve physical problems, not logical problems.

Robert Harmon

Rob's high performance data warehousing rule #4: Delete nothing, update only metadata.

Rob says: delete nothing, update only metadata.

Robert Harmon

Vin Vashishta explains why we should stop using dashboards

Vin Vashishta, the guy we all love to follow, has never seen a dashboard with positive ROI.

Firebolt Team

Rob's high performance data warehousing rule #3: Strong operational data store ensures high performance

This has nothing to do with the DW itself. But if you miss it, you'll fail with your warehouse project.

Robert Harmon

Rob's high performance data warehousing rule #2: There's no point in measuring anything, if the data team can't measure itself.

"There's no point in measuring anything, if the data team can't measure itself."

Robert Harmon

Rob's high performance data warehousing rule #1: if you cannot constrain a thing, you cannot ingest that thing.

"If you cannot constrain a thing, you cannot ingest that thing."

Robert Harmon

Joe Reis and Matt Housley on the fundamentals of data engineering

Joe Reis and Matt Housley joined the bros for some much-needed ranting, priceless data advice, and good laughs.

Firebolt Team

How IQVIA Maximizes Analytics Performance for Life Sciences

IQVIA deep dive into maximizing impact of BI solutions for faster and more informed decision-making in Life Sciences.

Firebolt Team

Bill Inmon, the Godfather of Data Warehousing

As people in the data industry go, Bill Inmon is among the top, often seen as the godfather of the data warehouse.

Firebolt Team

Large scale data engineering at Momentive.ai - Meenal Iyer

Meenal Iyer, VP Data at Momentive.ai, talks about enforcing collaboration in large organizations

Firebolt Team

Data engineering from the early 2000s till today - BlackRock

When it comes to data management, have we come a long way since the early 2000s?

Firebolt Team

Data Management Lifecycle in Firebolt

Learn how the data management lifecycle looks like in Firebolt

Igor Stanko

A primer on analyzing semi-structured data (Part 2)

With the data ingested, let’s delve right into two popular frameworks to visualizing the data.

Firebolt Team

A primer on analyzing semi-structured data (Part 1)

This guide will provide you with the fundamental knowledge necessary to handle semi-structured data effectively. 

Firebolt Team

Distributed Query Execution in Firebolt

In this blog, we focus on distributed query execution as an integral part of Firebolt.

Benjamin Wagner
Lorenz Hübschle

Zach Wilson on what makes a great data engineer

How good you are at Spark or Flink ≠ how good you are at data engineering. Zach Wilson explains.

Firebolt Team

Data quality with ‘dbt’ and Firebolt

dbt data quality - Implementing data quality tests and using dbt extensions for enhanced data quality checks.

Robert Harmon

How ZipRecruiter and Yotpo power self-service data platforms that work

How ZipRecruiter and Yotpo build resilient self-service products that keep customers happy and engineers calm

Firebolt Team

25 Ad Tech Data Pros: Workshop Summary

In a recent workshop, 25 data pros working in the Ad Tech industry discussed querying large data sets efficiently

Matthew Darwin

How we mastered dbt: A true story

At Firebolt, we found out that a duet of dbt and Paradime works for our needs.

Olga Braginskaya

Data Observability with Millions of Users - Barr Moses

Barr Moses explains how to make sure your data is accurate in a world where so many different teams are accessing it

Firebolt Team

Analyzing the GitHub Events Dataset using Firebolt - Writing a Data App using Java

Writing a small data app using the Firebolt JDBC drive.

Alexander Reelsen

Analyzing the GitHub Events Dataset using Firebolt - Incremental Updates with Apache Airflow

Looking at GithubArchive dataset of public events - leveraging Apache Airflow workflows for keeping our data up-to-date.

Alexander Reelsen

Analyzing the GitHub Events Dataset using Firebolt - Using Jupyter for data exploration

In this blog we will discover the data using Streamlit and Jupyter and the Firebolt Python SDK.

Alexander Reelsen

Analyzing the GitHub Events Dataset using Firebolt - Querying with Streamlit

Writing a data app, using Streamlit and Jupyter and the Firebolt Python SDK. A multi-series blog.

Alexander Reelsen

Event streams in Firebolt

Event streams have always been problematic to analyze in SQL. This is how we do it.

Robert Harmon

How Amplitude Engineers Process 5 Trillion Real-time Events

Amplitude's cutting-edge data stack and how it processes 5 Trillion real-time events while dealing with mutable data

Firebolt Team

What is a Data App?

Data apps are applications that rely heavily on data and have an easy to use.

Firebolt Academy

AWS re:Invent Keynote Recap for Data Professionals

AWS re:invent 2022 was all about building the anticipation and delivering on expectations of us technologists. 

Firebolt Team

Making Observability a Key Business Driver

80% of the code that you write doesn’t work on the first try. But knowing which 80% is not working is the real challenge

Firebolt Team

Semi-structured data modeling

How to ingest, store and query JSON data, for example, is a consistent question on the minds of customers.

Firebolt Team

PostgreSQL Swiss army knife and The analytics workload

Is Postgres truly the right engine for analytics? 

Firebolt Team

Firebolt and Data Mesh

Data Mesh is hot stuff. But from a technology perspective it’s still not very well defined.

Matthew Darwin

Big Data Analytics for Gaming

In our recent ‘Big Data Analytics for Gaming Workshop’ we let the audience do the talking, here’s a summary of the talk.

Firebolt Team

A ClickHouse Review from a Practitioner’s Point of View

Sudeep Kumar, Principal Engineer at Salesforce considers the shift to Clickhouse as one of his biggest accomplishments

Firebolt Team

Hey David and Tristan, this is where Firebolt is at

"When I see David Jayatillake and Tristan Handy comment on Firebolt's approach it is clear that Firebolt is on track."

Robert Harmon

The Creator of Airflow About His Recipe for Smart Data-Driven Companies

Max walks the Bros through his recipe for a smart data-driven company, and the genesis of Airflow, Superset & Presto.

Firebolt Team

Druid Architecture Compared to Firebolt - A Practitioner’s View

Firebolt provides an alternative to Druid, delivering fast response times, high concurrency and the convenience of a Saa

Ben Hopp

Cloud data warehouse costs: Look before you leap

In this post, we look at factors to consider when building a data warehouse.

Firebolt Team

How Similarweb Delivers Customer Facing Analytics Over 100s of TBs

According to Yoav Shmaria, VP R&D Platform at Similarweb, the best way to manage data warehouse costs is tagging

Firebolt Team

Faster Data Replication from Kafka Using Hevo and Firebolt

How to Set Up Your Data Analytics Stack with Kafka, Hevo, and Firebolt.

Brian Bickell

A new level of efficiency in analytics

Are you spending more than you planned on your Data Warehouse? Analyze more. Use less compute resources.

Boaz Farkash

Loading Snowplow data into Firebolt with dbt

How to enable sub-second analysis across billions of rows of customer behavior data: Part I - Setting up the load

Todd Beauchene

How Klarna Designed a New Data Platform in the Cloud

Klarna is one of the leading fintech companies in the world, valued at $45B.

Firebolt Team

How Eventbrite is Modernizing its Data Stack

An episode about Eventbrite’s data stack modernization process, and how you get engineers to adopt new technologies

Firebolt Team

Simplicity and Power of Agg Indexes at Scale

One of the ways Firebolt is able to support data-driven applications is by leveraging aggregating indexes on the tables.

David Welch
Luka Lovosevic

A Deep Dive into Slack's Data Architecture

How the data platform evolved as Slack grew from a startup to an IPOed and then acquired company.

Firebolt Team

Transitioning Scopely’s 5.5 PB Data Platform to the Modern Data Stack

Should data engineering AND BI be handled by the same people?

Firebolt Team

Getting Rid of Raw Data with Jens Larsson

Why would you create ugly data? According to Jens Larsson, don’t even go near raw data.

Firebolt Team

5 steps to debug your complex SQL queries in Firebolt

Let us guide you through the process of identifying the performance bottlenecks in your query in just 5 simple steps.

Matan Sarig
Roy Hegdish

How Zendesk engineers manage customer-facing data applications

Ananth Packkildurai is Principal Software Engineer at Zendesk and runs one of the strongest newsletters in data

Firebolt Team

Future of Performance is Not About Performance

The data warehousing market has gone absolutely mad over performance. Why is this the case?

Tino Tereshko

SQL: Thinking in Lambdas

Many programming languages are imperative – tell the compiler how to operate by providing the instructions in order.

Octavian Zarzu

Firebolt Announces Series C Round at $1.4 Billion Valuation to Build the World's Fastest and Most Versatile Cloud Data Warehouse

Demand from engineering teams has skyrocketed since Firebolt emerged from stealth last year

Firebolt Team

How are those data intensive customer facing apps engineered at Gong?

Gong manages hundreds of thousands of videoconferences and millions of emails PER DAY, which add up to hundreds of TBs.

Firebolt Team

How Bolt Engineers Are Designing Its Next-Gen Data Platform

Bolt engineers are in the midst of designing a new next-gen data platform

Firebolt Team

Firebolt Indexes in Action (clustered and non-clustered)

Indexes are the primary way for users to accelerate query performance in Firebolt. Learn about them here.

Octavian Zarzu

How did Agoda scale its data platform to support 1.5T events per day?

Scaling a data platform to support 1.5T events per day requires complicated technical migrations

Firebolt Team

Cloud Data Warehouse: The Hitchhikers Guide

Everything you needed to know about cloud data warehouses but were afraid to ask...

Firebolt Academy

Postgres and MySQL for Analytics - Meeting the 1 second SLA

Learn when to use Postgres, MySQL, in-memory databases, HTAP, or data warehouses to meet the 1 sec SLA in analytics.

Robert Meyer

Diving Into GitHub's Data Stack

It’s the mother of all development projects. You use it daily. And so do 65M developers around the world.

Firebolt Team

Top 10 Ways to Improve Cloud Data Warehouse Performance (And how it’s done in Firebolt)

Lear the top 10 tips of how to improve your cloud data warehouse performance.

Robert Meyer

Building Data Products For Data Engineers

How does a tech stack that always needs to be at the forefront of technology look like?

Firebolt Team

Snowflake vs Databricks vs Firebolt

More and more, people are asking me “how do you compare Snowflake and Databricks?” We did our best to answer.

Robert Meyer

How Vimeo Keeps Data Intact with 85B Events Per Month

How Vimeo handles Data Ops to deal with massive scale?

Firebolt Team

How Substack's Data Platform Supports 500K Paying Subscribers

How does Substack's data platform support 500K paying subscribers?

Firebolt Team

A Technical Deep Dive to Yelp's Data Infrastructure — with Steven Moy

Steven Moy thoroughly explains Yelp’s data architecture under the hood and how it evolved over the past ten years.

Firebolt Team

How do Canva's engineers and analysts scale data platforms to keep up with growth? — with Krishna Naidu

Canva is one of the hottest, if not the hottest, graphic design platforms out there.

Firebolt Team

How AppsFlyer manages scale without sacrificing performance

Appsflyer deals not only with 120 billion events per day, but does so while growing quickly as a company

Firebolt Team

Firebolt Ignites Growth with a $127M Series B Funding Round

Upstart cloud data warehouse sees rapid growth in 2021, plans to double its workforce

Firebolt Team

Amazon Athena 2.0 (Presto 0.217) - What’s New

Amazon Athena engine version 2 - what’s new and big enough to call this a 2.0 release?

Robert Meyer

The real meaning of a data lake

Making sense of a data lakes, delta lake, lakehouse, data warehouse and more.

Robert Meyer

JSON the SQL: Choosing the best data warehouse for semi-structured data

Working with semi-structured data can be more like a Jason (horror movie) Sequel than JSON SQL.

Robert Meyer

ETL Vs. ELT - Know The Differences

Explore the significant differences between ELT and ETL data integration processes and find the best option for you.

Firebolt Team

How to accelerate Looker performance on Redshift, Snowflake and BigQuery

How to accelerate Looker performance on Redshift, Snowflake and BigQuery? Short-term fixes and the long-term solutions.

Robert Meyer

The Double Redshift: About Redshift Alternatives and Limits

When do you need to shift from Redshift, and what are the alternatives? Learn here.

Robert Meyer

How to upgrade from Tableau extracts to a fast Tableau live connection

Learn how to upgrade from Tableau extracts to Tableau live connection to deliver sub-seconds performance every time.

Robert Meyer

AWS Athena Error: Query exhausted resources at this scale factor

If you’re using Amazon Athena, you may have seen these errors. About AWS Athena errors and how to deal with them.

Robert Meyer

Snowflake vs. Redshift (vs. Firebolt)

 A detailed comparison of Snowflake vs. Redshift, by architecture, scalability, performance, use cases and cost.

Robert Meyer

Athena vs. Redshift Spectrum vs. Presto

Learn some simple rules of thumb you can use to choose the best federated query engine for your company's needs.

Robert Meyer

Choosing Between The Best Federated Query Engine And a Data Warehouse

How companies should avoid creating a slow many headed federated Gorgon out of out of Athena.

Robert Meyer

Why even simple queries can be slow (but not with Firebolt)

Why even simple queries can be slow in cloud data warehouses and how Firebolt uses indexing to prune data and stay fast?

Boaz Farkash

How To Support Ad Hoc Analysis - Part 2

How to support ad hoc analysis - Part 2: The right ad hoc analytics architecture

Robert Meyer

How To Support Ad Hoc Analysis - Part 1

How to support ad hoc analysis: Part 1 - The 4 requirements for an ad hoc analytics architecture

Robert Meyer

The data hitchhiker’s guide to cloud analytics

"In the beginning, there was a data mess". Don’t Panic, just read our data hitchhiker’s guide to cloud analytics.

Robert Meyer