Technical Deep Dive: Automated Column Statistics · Product

Firebolt’s automated column statistics keep optimizer insights up to date, improving query plans and performance automatically—no query changes required.

The $100M Problem: How Lyft's Data Platform Prevents ML Failures with Ritesh Varyani at Lyft · Podcast

Lyft's Ritesh Varyani on building a unified data platform (Spark, Trino, ClickHouse) balancing OSS & AI reliability.

60 Billion Predictions Daily: Inside Credit Karma’s Agentic Data Layer · Podcast

Maddie Daianu (Head of Data & AI) shares Credit Karma's multi-cloud strategy using BigQuery and an agentic layer.

Technical Deep Dive: Efficient and ACID Compliant Vector Search Indexes in Firebolt · Product

Technical Deep Dive: Efficient and ACID Compliant Vector Search Indexes in Firebolt

Pruning even more data with late materialization · Product

Learn how Late Materialization speeds up top-K queries by delaying column scans.

Block Bad Data Before the Write with Nike’s Ashok Singamaneni

Ashok Singamaneni built Spark Expectations at Nike to enforce data quality and reduce recomputes in production data pipelines.

FuzzBerg: Hunting Bugs in Iceberg and file-format readers · Product

Firebolt FuzzBerg to accelerate security testing of Iceberg and other file based readers.

Postgres vs. Elasticsearch: The Unexpected Winner in High-Stakes Search for Instacart

Learn how Instacart's team moved from Elasticsearch to a custom Postgres setup, improving search efficiency and control.

Firebolt ARM Rollout · Product

Firebolt ARM Rollout; optimising price performance

Implementing Explicit Multi-Statement Transactions in a Stateless, Cloud-Native Architecture · Product

Firebolt supports explicit, multi-statement transactions using BEGIN, COMMIT, and ROLLBACK syntax while maintaining ACID compliance and stateless architecture.

Implementing Firebolt MERGE Statement · Product

Technical deep dive on the powerful MERGE SQL command, enabling simultaneous operations on a single table.

Is Self-Service BI a False Promise? Lei Tang of Fabi.ai Thinks So

Fabi.ai's Lei Tang on bridging the gap between data teams and business users with intelligent AI systems

Automatic Cache Warmup · Product

Faster queries from the start with smart cache loading on engine boot, upgrade, and scale

Firebolt Auror · Product

Firebolt built Auror to securely validate container images with low latency in Kubernetes clusters.

Eliminating Redundant Joins in Firebolt for Faster SQL · Product

Firebolt removes redundant joins to boost SQL performance and optimize complex subqueries.

Building Uber's AI Assistant: How Genie Revolutionizes On-Call Support with Paarth Chothani from Uber · Podcast

Uber's AI Genie resolves on-call pains using internal data. Spark powers its infrastructure; LLMs revolutionize databases.

Firing Up Firebolt’s Client Ecosystem · Product

This features enables users to not use precious resources on just maintaining a connection when in fact their client is not doing anything.

From Zero to 100M Users: Inside Notion’s Data Stack and AI Strategy with Sumit Gupta

Learn how Notion's Lead BI Engineer, Sumit Gupta, uses AI to revolutionize data workflows and generate customer insights.

Introducing Firebolt Core - Self-Hosted Firebolt, For Free, Forever · Product

Firebolt's CTO and VP of Engineering discuss the launch of Firebolt's self-managed version, Firebolt Core.

Making Firebolt Fast By Doing Practically Nothing · Product

Hear about the different methods deployed in Firebolt for reducing the number of scanned rows (aka pruning).

Live Engine Upgrades, Zero Downtime: The Firebolt Method · Product

Discover how Firebolt delivers seamless, no-downtime upgrades using shadow clusters and real-time performance verification to ensure peak reliability.

Querying Apache Iceberg with Sub-Second Performance · Product

Firebolt's new READ_ICEBERG capability does a lot of heavy lifting to provide low-latency access to your Iceberg tables.

How Rising Wave Is Redefining Real-Time Data with Postgres Power · Podcast

Explore cutting-edge PostgreSQL innovations, distributed database architecture, and cloud-native data processing solutions with YingJun Wu of Rising Wave.

Unlocking Simplicity and Security: Firebolt’s New LOCATION Object · Podcast

Discover the new LOCATION object, a foundational improvement to Firebolt’s data access model.

GROUPING SETS as a pure planner rewrite ? Yep - it's possible · Podcast

In this blog post you will learn how GROUPING SETS work and how Firebolt’s implementation uses smart query planning to execute them efficiently.

Exploring your data lake in Firebolt using just TVFs · Product

Discover how Firebolt implements SQL functions for data exploration.

Unlock Conversational Data Interaction: Firebolt MCP Server for Advanced LLM Integration · Product

Connect Firebolt to AI tools like Claude and Copilot using the new MCP Server to streamline workflows, run smart queries, and boost data engineering efficiency.

Decomposing Firebolt transactions · Product

Explore how Firebolt's transaction system maps to the four essential steps—Execute, Validate, Order, Persist. Learn how Firebolt uses MVCC, OCC, and Foundation

Beyond Database Optimization with AI · Podcast

Explore the innovative evolution of DuckDB and AI-driven database tech solutions with CEO DuckDB Labs, Hannes Mühleisen.

Revolutionizing Data Governance with DataStrato’s Unified Open Source Approach · Podcast

Discover DataStrato’s unified open-source approach to data governance and simplifying data management with Lisa Cao.

Robust and efficient geospatial operations using snap rounding (Part III) · Product

We will explore in more detail how Firebolt implements robust operations on geospatial data.

AI and Data Movement: Trends and Best Practices with Estuary’s Daniel Pálma · Podcast

Explore the world of data engineering and marketing, real time data integration, AI and data movement with Daniel Pálma.

Architecture and Internal Representation of the GEOGRAPHY Data Type (Part II) · Product

Explore how Firebolt processes & optimizes GEOGRAPHY data using S2 cells, shape indexes, and query pruning for peak performance.

Building Geospatial Support in Firebolt (Part-I) · Product

Implementing fast geospatial queries in Firebolt using the S2 Geometry Library.

Firebolt’s zero-copy clone · Product

Discover Firebolt’s Zero-Copy Clone feature: a cost-efficient way to clone massive tables instantly without duplicating data.

AI and Data Change Management with Chad Sanderson, CEO Gable AI · Podcast

In this episode of The Data Engineering Show, Chad Sanderson explores the world of data change management.

Tech Stacks and Tradeoffs: Xudo's Founder on Picking the Right Tools for BI Success · Podcast

Wouter Trappers shares his slightly unconventional path from philosopher to data consultant and engineer.

Data Rewind: Conversation Highlights from Zach, Matthew, Joe, and Krishnan · Podcast

Dive into key highlights from Firebolt's Data Rewind conversation series.

Building Customer Trust: A CISO's Perspective on Security and Privacy at Firebolt

Build trust with a CISO's perspective on Firebolt's security and privacy commitments.

The Resurgence of SQL: Insights from Ryanne Dolan from LinkedIn · Podcast

In this episode of The Data Engineering Show, Ryanne Dolan from LinkedIn joins the Bros to discuss LinkedIn's Hoptimator project.

Fuzzing Firebolt: Catching 0-days as fast as our query processor · Product

Learn how Firebolt identifies zero-day vulnerabilities as efficiently as its query processor.

How we built Firebolt · Product

Gain insights into how Firebolt was built to redefine cloud data performance and scalability.

Caching & Reuse of Subresults across Queries · Product

Enhance query performance with Firebolt's caching and subresult reuse features.

Making a Query Engine Postgres Compliant Part I - Functions · Product

Learn about making a query engine Postgres-compliant in part one of this in-depth series.

Engines: Online Scaling and Upgrades · Product

Firebolt engines provide multi-dimensional elasticity to our customers allowing them to achieve desired price-performance without causing downtime for customers

Vector Databases Won’t Replace SQL - Andy Pavlo · Podcast

Andy Pavlo, Associate Professor at Carnegie Mellon University, delves into database internals and optimization.

How ZoomInfo transitioned from data graveyards to ROI-driven data projects · Podcast

Too often expensive resources and manhours are spent on dashboards no one uses, resulting in zero ROI.

Matthew Weingarten from Disney Streaming about Data Quality Best Practices · Podcast

Principles essential for data quality, cost optimization, and data modeling, as adopted by the world's leading companies

Joseph Machado, Senior Data Engineer at LinkedIn talks best practices · Podcast

Data engineering should be less about the stack and more about best practices. While tools may change, foundational principles will remain constant.

Professors Joe Hellerstein and Joseph Gonzalez on LLMs · Podcast

Joe Hellerstein and Joseph Gonzalez inspired generations of database enthusiasts and are now on the show

Megan Lieu on powerful notebooks that enable collaboration · Podcast

Megan Lieu about her approach to data advocacy as well as the power of notebooks, especially when they enable collaboration

Transitioning from software engineering to data engineering · Podcast

This time on The Data Engineering Show, Xiaoxu Gao is an inspiring Python and data engineering expert with 10.6K followers on Medium.

Vin Vashishta explains why we should stop using dashboards · Podcast

Vin Vashista, the guy we all love to follow, has never seen a dashboard with positive ROI. He met the bros to talk about replacing BI dashboards with analytics.

Joe Reis and Matt Housley on the fundamentals of data engineering · Podcast

Joe Reis and Matt Housley joined the bros for some much-needed ranting, priceless data advice, and good laughs.

Bill Inmon, the Godfather of Data Warehousing · Podcast

As people in the data industry go, Bill Inmon is among the top, often seen as the godfather of the data warehouse.

Large scale data engineering at Momentive.ai - Meenal Iyer · Podcast

Meenal Iyer, VP Data at Momentive.ai, talks about enforcing collaboration in large organizations

Data engineering from the early 2000s till today - BlackRock · Podcast

When it comes to data management, have we come a long way since the early 2000s?

Zach Wilson on what makes a great data engineer · Podcast

After years of data engineering experience at Airbnb, Netflix, and Facebook, Zach Wilson is now focused on spreading the knowledge in EcZachly

How ZipRecruiter and Yotpo power self-service data platforms that work · Podcast

How ZipRecruiter and Yotpo build resilient self-service products that keep customers happy and engineers calm

Data Observability with Millions of Users - Barr Moses · Podcast

Barr Moses explains how to make sure your data is accurate in a world where so many different teams are accessing it

How Amplitude Engineers Process 5 Trillion Real-time Events · Podcast

Amplitude's cutting-edge data stack and how it processes 5 Trillion real-time events while dealing with mutable data

Making Observability a Key Business Driver · Podcast

80% of the code that you write doesn’t work on the first try. But knowing which 80% is not working is the real challenge

A ClickHouse Review from a Practitioner’s Point of View · Podcast

Sudeep Kumar, Principal Engineer at Salesforce considers the shift to ClickHouse as one of his biggest accomplishments.

The Creator of Airflow About His Recipe for Smart Data-Driven Companies · Podcast

Maxime Beauchemin, the CEO & Founder at Preset and Creator of Apache Superset and Airflow, told the Data Bros about his recipe for a smart data-driven company.

How Similarweb Delivers Customer Facing Analytics Over 100s of TBs · Podcast

According to Yoav Shmaria, VP R&D Platform at Similarweb, the best way to manage data warehouse costs is tagging

How Klarna Designed a New Data Platform in the Cloud · Podcast

While many corporations are “stuck” on-prem, Klarna made the move and today is a cloud-only company. Gunnar Tangring explains how.

How Eventbrite is Modernizing its Data Stack · Podcast

An episode about Eventbrite’s data stack modernization process, and how you get engineers to adopt new technologies

A Deep Dive into Slack's Data Architecture · Podcast

How the data platform evolved as Slack grew from a startup to an IPOed and then acquired company.

Transitioning Scopely’s 5.5 PB Data Platform to the Modern Data Stack · Podcast

Should data engineering AND BI be handled by the same people?

Getting Rid of Raw Data with Jens Larsson · Podcast

Why would you create ugly data? According to Jens Larsson, don’t even go near raw data.

How Zendesk engineers manage customer-facing data applications · Podcast

Ananth Packkildurai is Principal Software Engineer at Zendesk and runs one of the strongest newsletters in data

How are those data intensive customer facing apps engineered at Gong? · Podcast

Gong manages hundreds of thousands of videoconferences and millions of emails PER DAY, which add up to hundreds of TBs.

How Bolt Engineers Are Designing Its Next-Gen Data Platform · Podcast

Bolt's ride-hailing app serves 2B users globally and handles 500K queries daily. Erik Heintare sharing how it's going to solve their biggest data challenges.

How did Agoda scale its data platform to support 1.5T events per day? · Podcast

Scaling a data platform to support 1.5T events per day requires complicated technical migrations and alignment between hundreds of engineers.

Diving Into GitHub's Data Stack · Podcast

It’s the mother of all development projects. You use it daily. And so do 65M developers around the world.

Building Data Products For Data Engineers · Podcast

How does a tech stack that always needs to be at the forefront of technology look like?

How Vimeo Keeps Data Intact with 85B Events Per Month · Podcast

How Vimeo handles Data Ops to deal with massive scale

How Substack's Data Platform Supports 500K Paying Subscribers · Podcast

How does Substack's data platform support 500K paying subscribers?

A Technical Deep Dive to Yelp's Data Infrastructure — with Steven Moy · Podcast

Steven Moy, Software Engineer at Yelp, has joined the Data Bros to discuss Yelp's Data Infrastructure on the Data Engineering Show Podcast.

How do Canva's engineers and analysts scale data platforms to keep up with growth? — with Krishna Naidu · Podcast

Canva is one of the hottest, if not the hottest, graphic design platforms out there. How are they handling growth? Krishna Naidu answers told the data bros.

How AppsFlyer manages scale without sacrificing performance · Podcast

Alexandra Sudilovski, Senior BI Expert at AppsFlyer, told the The Data Engineering Show how AppsFlyer manages scale without sacrificing performance.