tableau extracts tableau live connection
March 24, 2021
March 24, 2021

How to upgrade from Tableau extracts to a fast Tableau live connection

Listen to this article

Listen to this article

Tableau literally means “little table” in French, so perhaps it shouldn’t be such a surprise when I tell you that most of the Snowflake and Tableau deployments I’ve seen rely on little tables, or Tableau extracts. They do not use the Tableau live connection.

If you are using Tableau extracts and want to see what a live connection looks like against 20TB of data, feel free to watch this Tableau demonstration and explanation.

Now, It’s also true with Redshift, not to mention Athena. In the case of Snowflake it’s for two reasons. It’s not just that Snowflake can get expensive. Snowflake queries from Tableau take too long. Just look at the FiveTran benchmark; queries against just 1 Terabyte (TB) of data can take 8-11 seconds on average, and sometimes take over a minute. Then there’s the SQL that Tableau generates. It can get pretty complex. And yes, it’s also true with Redshift and Athena.

What analysts and data warehouse teams really want is a “Tableau vivant”. That may just sound like a Tableau living a good life. It literally means a living picture. Analysts want the specific data they need as they slice and dice or drill down, and they want the data live. But Tableau extracts don’t work for ad hoc or interactive analytics at any “reasonable” scale, which basically means a terabyte or more. Each extract generally captures a small subset of aggregated data, whatever can fit on a laptop. But when you don’t understand an issue, you are generally going to analyze the data in a “non-standard” way, outside the typical report or dashboard view. 

Analysts don’t want to wait hours to get a Tableau extract, let alone a second or third one for each new question you ask. Data warehouse teams don’t want to be overwhelmed by the number of extracts they need to generate as more and more reports and dashboards are created.

Once you need to perform interactive analytics and you have a terabyte of data, that generally means you need a Tableau live connection that can return queries in “sub-seconds” which means 1-2 seconds or less. Tableau is not the problem. The FiveTran numbers show that the problem is the data warehouse. 

How to enable a Tableau live connection

The only way to support a live connection is with a faster analytics engine. But it can’t just be fast. It has to be fast with Tableau SQL.

Several companies have turned on a Tableau live connection by adding Firebolt side-by-side with Snowflake, Redshift or Athena. But what makes Firebolt faster?

The first difference is faster data access. When you decouple storage and compute, you introduce a bottleneck; the network. Most other cloud data warehouses are not storage-aware. When they fetch data, they fetch entire partitions or segments over the network that they might need, and then process them locally. That’s a lot of data. Firebolt only fetches the data ranges it needs, which lowers the data movement 10x or more.

The second difference is faster query execution. Firebolt has a lot of execution optimizations. For example, it performs query vectorization, which basically batches multiple rows together by column for processing. It also uses indexing extensively, from indexes on each table, to join indexes, to indexes that precompute any aggregations. They’re like materialized views done right in that you don’t need to maintain the materialized views, and they’re used automatically by the query optimizer, without having to rewrite a query to point to a different table.

But the third difference is perhaps the most interesting. It’s the query optimization, which happens before the execution. Firebolt spent months analyzing and optimizing for Tableau queries. Tableau SQL can not only be somewhat complex, but can also rely heavily on joins. Part of the performance improvement is how Firebolt builds an eventual physical query plan from the Tableau SQL. 

For example, Firebolt can pushdown predicates and execute them before joins, to help filter down data sets before the joins. Firebolt is the only data warehouse that can pushdown multiple levels, beyond multiple joins. Firebolt also substitutes many of the aggregate calculations with precomputed results from aggregating indexes. For joins, Firebolt uses specialized join indexes to reduce the size and number of full scans, in many cases replacing them with direct lookups. The result has been 4-6000x faster query performance across queries as benchmarked by customers.

In short, the combinations of optimized data access, indexes, query optimization and execution have allowed companies to turn on Tableau live connections and deliver sub-seconds performance every time.

Live the life of a Tableau Vivant

It’s not hard to live the fast life with Tableau, and add a Tableau live connection. With Firebolt, most engagements have taken a few days to get the data in and prove out the performance and price-performance gains. The implementations have typically taken a few weeks.

If you’re still not convinced or have more questions, come watch live and see how Firebolt handles a Tableau live connection. Some of the Tableau dashboards you’ll see are interactive dashboards running complex queries against 20TB of data. Then get started on your own journey.


Read all the posts

Intrigued? Want to read some more?