Data visualisation with Streamlit and Metabase
May 15, 2023

A primer on analyzing semi-structured data (Part 2)

Allegro: Visualizing data with Streamlit & Metabase

With the data ingested (Part 1 of this blog post), let’s delve right into two popular frameworks to visualizing the data: 

  • Streamlit
  • Metabase

Streamlit

Streamlit is an open-source app framework. It allows you to build and share data apps written in Python with no front-end experience. 

All the coding required to build the data app can be done in a single Python file. We will create an additional file to store the Firebolt connection parameters. 

The following packages will be required and can be installed via pip:

  • Firebolt-sdk
  • Streamlit
  • Pandas
  • Altair
  • Matplotlib
  • Dotenv

Open your favorite editor and create two files:

  • .env - store the credentials to connect to Firebolt
  • app.py - code to generate the Streamlit app. 

.env file

app.py file

First, let’s import all the necessary packages and create a method that takes all the environment variables and connects to Firebolt. 

With the cursor object being defined, each visualization component will follow the same pattern:

  • Execute a query using the cursor object
  • Fetch all the results and store them into a Pandas Dataframe
  • Visualize the results using Streamlit methods.

Let’s first add a title to our app and delve right into building it! 

To start the application, you need to run the following command:

streamlit run app.py

Now navigate to http://localhost:8501/

NY Philharmonic Performance History

The filters area

First, let’s start with the filter area. We will only need a single filter based on the `Concert Date` field. The filter is going to leverage the Streamlit slider widget.

Next, using the two filter values, slider_min, and slider_max, let’s count how many distinct programs are within the range.

The Charts area

With the filter area created, let’s proceed with visualizing three pieces of data:

  • Number of programs per season as a bar chart
  • Popular composer per season using the Altair package
  • Popular venues as a pie chart using the Matplotlib package
Number of programs per season as a bar chart
Popular composer per season using the Altair package

Given the high number of composers each season, we would like to filter out only the ones that appear in more than 25 programs each season. 

Popular venues as a pie chart using the Matplotlib package

We will only filter venues with more than 100 programs in the given range.

The entire app.py file can be found here.

Metabase

Metabase is an open-source data visualization and business intelligence tool that simplifies the process of analyzing, querying, and sharing data. It offers a user-friendly interface for creating custom dashboards and reports, enabling non-technical users to explore their data and make informed decisions without the need for extensive coding or SQL knowledge.

Metabase comes in two flavors: 

  • Self-hosted
  • Cloud 

We’ll go the self-hosted route using Docker. Here is what you need to do to get it started:

  • Download the latest driver from here. (firebolt.metabase-driver-x.y.zz.jar)
  • Start the Metabase container

docker run -d -p 3000:3000 --name metabase metabase/metabase:v0.44.6.1

  • Copy the driver to the plugins folder on the container

docker ps # to find the <container_id>

docker cp firebolt.metabase-driver-x.y.zz.jar <container_id>:/plugins

  • Restart the docker image
  • Navigate in your browser to localhost:3000. You should see a similar image to the below one
  • Choose Firebolt as a data source and set up the connection

Great, let’s now try to replicate the app built with Streamlit. We will need the following:

  • Number of programs in a given timeframe
  • Number of programs per season in a given timeframe
  • Popular composer per season in a given timeframe
  • Popular venues as a pie chart in a given timeframe

From the top-right corner, click the New button and select SQL query:

We will reuse the queries from the Streamlit python file, and only change the WHERE clause to leverage Metabase variable syntax: concert_date > {{stard_concert_date}}::timestamptz

The {{ var }} syntax allows passing in variables to your SQL statements. Metabase automatically identifies the variables from your scripts and allows you to configure them. You can define the variable type (Text, Number, Date, Field Filter), what the default value should be, and if it’s required or not.

Running the query gives us the option to visualize the results either as a table or as a chart.

Let’s choose the Bar type chart for this query. 

We can now name the query and save it. Given that all four questions share the same pattern, we can duplicate the above tile for the remaining questions. 

  • Popular composer per season in a given timeframe

We will still go with the Bar chart, but stack the values (Settings -> Display -> Stack) 

  • Popular venues as a pie chart in a given timeframe


We can now proceed with building our first dashboard in Metabase!

New -> Dashboard -> Give it a name -> Create

Next, let’s add two filters that we’ll be connected to every tile/chart we created before. 

Filter -> Time -> Single Date 

Add (+) the four charts to the dashboard and click on the global filters to link to each report. 

Rearrange the tiles, and in less than 10 minutes you have a dashboard created with Metabase running locally!

Finale: Conclusions

If you’ve made it so far, congrats! We’ve seen how to load json data, analyze and transform it using Firebolt’s JSON and Array functions. Lastly, we’ve seen how to connect Firebolt to Streamlit and Metabase to visualize the data.

If you have any questions, head over to help.firebolt.io and ask!

Get your hands on Firebolt's fast Cloud Data Warehouse in our next interactive, instructor-led workshop.

Read all the posts

Intrigued? Want to read some more?

Curious to learn more?