Canva is one of the hottest, if not the hottest, graphic design platforms out there. With 55 million active users and around 500 million dollars in annual revenue, Canva is an unstoppable powerhouse. They were recently valued at 16 billion dollars!
So how do Canva analysts and engineers scale their data platforms to meet the company's insane growth?
To help us find out, we recently invited Krishna Naidu—Data Engineer at Canva and expert in building large data platforms—to speak with us on The Data Engineering Show.
How much data does Canva deal with?
Canva processes a huge volume of data. At the time of speaking with Krishna, the Canva data warehouse consists of about 400 TB of data, seeing daily volumes of about 2 TB of raw data. Their biggest data set is their event tracking table, which tracks all of their analytics events.
How many people work on the data team?
At Canva, the software is always changing so the company requires a large team to keep up. There are currently about 20 data engineers,20 data scientists, and 40 data analysts—and the team is still hiring globally.
What does the data stack look like at Canva?
Canva has historically used both a data lake and a data warehouse, and will likely continue to use both in some form.
Streaming is the main source of incoming data. Incoming data goes to the data lake and is stored nicely in Delta format. Delta format is the foundation of the data.
From there, the data is loaded into the data warehouse, which uses Snowflake.
They also keep a raw data lake. Currently, both the warehouse and Delta Lake consume from the raw lake but Krishna and the team are working on transitioning with Snowflake to consume more from the Delta Lake to cut down on repetition.
Krishna was hired at Canva to revamp their data warehouse. As Canva began its explosive growth, they struggled to scale the existing data warehouse. They needed a better way to gain the performance and storage that they needed.
What is the data team at Canva focusing on now?
Now that the warehouse has been revamped and contributions are up, Krishna and his team are working on enhancing analyst and engineering productivity.
With a team of 40 people who might be working at any onetime in the development environment, things can get tricky. Using Snowflake, Krishna’s team is working on a new workflow that will allow for better testing and rebuilding.
They are also prioritizing giving more control and ownership of the massive raw datasets to backend and frontend engineers, to enable contributions from the broader organization.
Listen to the full episode for more insights from Krishna and subscribe too our YouTube channel to never miss a podcast episode.